Interrupt race conditions on network interface cards



  • Hi,

    Config: pfSense 2.3.4_p1 on a 6/12 core processor using 16Gb RAM and a 4-port Intel i350-T4v2 Gb NIC. 500Mb/s up and down internet connection

    I'm experiencing what I think is strange behaviour: I have a Windows backup server that periodically checks remotely connected Windows servers for changes. The backup server is connected to the internal LAN, and the remote servers are connected via an IPSec tunnel on the WAN. The backup process for each remote server is a robocopy script using an SMB connection. As long as the number of scripts from the backup server stays under about 8 there are no problems, but once the number of scripts (SMB connections) rises, both WAN an LAN interfaces are going into CPU race condition on the interrupts. Both NICs hit about 80/90% of CPU interrupt time and all traffic from LAN to WAN and vice versa almost comes to a halt.

    • The SMB traffic is less than 10Mb/s, so it's not saturated

    • I have already tried to use interrupt polling, but it only gives me better CPU performance on the idle CPU's, not on the troubled WAN and LAN interrupts

    • I used the onboard Broadcom NICs before using the Intel NIC, but it gave the same issue

    • Once I stop the robocopy scripts, the CPU interrupt time stays high for both NICs (at least for some time) and only a restart of the machine resets the race condition

    Any idea as to why the interrupts take so much CPU time, since about 12 SMB connections using about 10Mb/s seem to take the pfSense pretty much down…

    Greetings,
    Gesture



  • :-[ Unfortunately I had another one today, 8 out of the 12 processors where going berserk on the interrupts, while there was only 20Mb/s and between 2000 and 5000 pps.


Log in to reply