Latency spike when pushing GBit LAN<->LAN (vlan issue?)



  • Hi,

    I have this box: http://www.jetwayipc.com/product/jbc375f533w-1900-b4/ with 4 x I211-AT Gigabit LAN with pfSense 2.4.4 installed.

    I have 2 WAN interface and 2 LAN.

    When I push GBit with iperf3 from server1 (connected to LAN1) to server2 (connected to LAN2), there is enourmous latency spike on the interfaces. Also even WAN and/or WAN2 spike until they go offline. This 'WAN or WAN2' behaviour seems random; sometimes WAN1 spikes and goes down, sometimes WAN2.

    I have tries some tweaking, but it seems to have no or little effect. I have tried some things taken from https://docs.netgate.com/pfsense/en/latest/interfaces/low-throughput-troubleshooting.html :

    # sysctl net.inet.tcp.tso=0 
    # sysctl hw.pci.enable_msix=0 (to force use of MSI instead of MSIX)
    # sysctl net.inet.ip.intr_queue_maxlen=3000
    

    This was done assuming the sysctl commands will take effect immediately and doesn't need a reboot. (I didn't reboot)

    What I still can try:

    # sysctl hw.igb.num_queues=1 (queues per core/card to 1)
    # sysctl hw.igb.fc_setting=0 (flow control off)
    

    I'm kinda stuck here. When enabling a CodelQ limiter of 500MBit (floating rule), there seems to be no problem, but at higher speeds, the NICs just can't handle it it seems. Do you recognize this behaviour? Any ideas how to proceed? Thanks very much!

    EDIT:

    • Also WAN2 is has GBit internet and the same happens when doing iperf over WAN to another server.
    • The CPU isn't used that much (I believe) when doing iperf. IRQ processes on the interface spike up to 20% CPU usage.

    EDIT2: It must have something to do with VLANs I think. https://forum.netgate.com/post/835456



  • IIRC, boxes like this and other Netgate boxes prior to the latest ARM models are not the same as your typical network switch. Passing packets interface to interface, even if they are bridged like a switch without any filtering can create significant load on the box. Poke around the forum and you'll find recommendations to use a switch with VLANs instead of using the pfSense box in this manner.



  • Thanks. This setup with two servers is only for debugging/testing purposes. My goal is not trying to switch two LANs per se.
    I ran into this trouble when using the GBit WAN interface from the LAN. It should be no problem to use GBit WAN from a single LAN client is my experience.

    I have tried iperf server1<->server2 over the same LAN (switch without router) and that gave me stable GBit. Now with only the router in between, there is trouble which must be caused by the router.



  • @jelle-e said in Latency spike when pushing GBit LAN<->LAN:

    # sysctl hw.pci.enable_msix=0 (to force use of MSI instead of MSIX)
    

    This was done assuming the sysctl commands will take effect immediately and doesn't need a reboot. (I didn't reboot)

    Well your assumption is wrong.



  • I just found out it must have something to do with VLANs.

    My setup:
    4core 2ghz / 4gb / 4x I211-AT NIC
    2x WAN (1x 250MBit / 1x GBit)
    2x LAN
    2x Netgear GS724T switches

    Case1; with vlans (issues)
    -> Server1 connected to LAN1 (no vlan)
    -> Server2 connected to LAN2 (with tagged vlan behind the 2 Netgear switches)

    • Doing iperf3 between the two servers is bringing down the WAN1 interface! )(strange bacause there should be no traffic going over that interface)
    • Doing iperf3 from server2 over WAN2 to another iperf3 server brings down WAN1 (note; traffic is going over WAN2, but WAN1 latency spikes and gateway goes down)
      Latency spikes above 1000ms and then brings down the WAN.

    Case2; no vlans (no issues)
    -> Server1 connected to LAN1 (no vlan)
    -> Server2 connected to LAN2 (NO VLAN behind the 2 Netgear switches)

    • Doing iperf3 between the two servers is giving me GBit.
    • Doing iperf3 from server2 over WAN2 is giving me GBit WAN.
      Latency doesn't spike above 10ms.

    So, my conclusion is that the VLANs are causing a lot of problems with high throughput. I couldn't find many tweaks to do with VLANs in pfSense, so I hope someone has an idea about this? Thanks!

    EDIT: It's not WAN2 that goes down in Case1 but WAN1. So, it's always WAN1 that's going down. Is there something about the default gateway that's used with all VLANs or something?


Log in to reply