Chelsio T520 not working as WAN interface



  • Vanilla Pfsense 2.4.4 installation on amd64 arch.

    I'm using a Chelsio T520-LL-CR 2x SFP+ 10G NIC with one port as my WAN interface and the other as LAN. The WAN IP is obtained through PPPoE. Whenever I use the Chelsio card for WAN/PPPoE, I can dial a PPPoE session, get an IP, and make DNS queries, but absolutely no TCP traffic goes through. Packets go out, but no replies come back. No issues when using onboard Intel NIC or when using the Chelsio card in Ubuntu.

    T520 is connected to a Ubiquiti ES-16-XG using certified DAC cables, 10g link is stable
    LAN works fine on either port, WAN works on neither.
    Using Intel NIC instead of Chelsio works fine for WAN+PPPoE
    Booting into Ubuntu and dialing a PPPoE session with the Chelsio card works fine.

    Tried:

    Reinstalling Pfsense
    Switching to SFP+ optical instead of DAC
    Alternating ports on the switch
    Alternating ports on the card
    Disabling/enabling virtually all combinations of offload / checksum acceleration features

    I am convinced it is a problem with Pfsense and/or the FreeBSD drivers. I can sit at the Pfsense console for hours flipping PPPoE between igb0 and cxl0, watching igb0 work fine and cxl0 not work, all the mean time cxl1 is working fine for LAN. Any help would be greatly appreciated!



  • Card details:

    dev.t5nex.0.scfg_version: 289427456
    dev.t5nex.0.bs_version: 1.1.0.0
    dev.t5nex.0.er_version: 1.0.0.68
    dev.t5nex.0.na: 00074330F890
    dev.t5nex.0.md_version: t4d-0.0.0
    dev.t5nex.0.ec: 0000000000000000
    dev.t5nex.0.pn: 110116750D1
    dev.t5nex.0.sn: PT14150258
    dev.t5nex.0.hw_revision: 0
    dev.t5nex.0.firmware_version: 1.19.1.0
    dev.t5nex.0.tp_version: 0.1.4.9


  • Netgate Administrator

    Double check you have a default route via the PPPoE gateway when you use the Chelsio WAN.
    You might have a static route to DNS servers allowing that to work whilst other do not.

    If you don't set the PPPoE connection as default specifically in System > Routes rather than automatic.

    Steve



  • Thanks for the suggestion. I took a quick peek at the routes and gateways and everything looks good. I explicitly selected WAN_PPPOE as the default ipv4 gateway in System->Routes, no change.

    Routing tables
    
    Internet:
    Destination        Gateway            Flags     Netif Expire
    default            10.11.1.113        UGS      pppoe0
    10.11.1.113        link#9             UH       pppoe0
    50.101.75.22       link#9             UHS         lo0
    127.0.0.1          link#6             UH          lo0
    192.168.1.0/24     link#1             U          cxl0
    192.168.1.1        link#1             UHS         lo0
    

    And this is the output of the PPPoE connection info while using the Chelsio card. It looks identical to the Intel NIC, except the fact it isn't working of course. In/out packet counts are high because the stats do not appear to be reset when changing between parent interfaces (I had igb0 running for 7-8 hours racking up the packet counts).

    Status
        up
    PPPoE
        up 
    Uptime
        00:03:06
    IPv4 Address
        50.101.73.xxx
    Subnet mask IPv4
        255.255.255.255
    Gateway IPv4
        10.11.1.113
    IPv6 Link Local
        fe80::7285:c2ff:fe80:dcd3%igb0
    DNS servers
        127.0.0.1
        8.8.4.4
        8.8.8.8
    MTU
        1492
    In/out packets
        9334212/24963876 (3.17 GiB/28.43 GiB)
    In/out packets (pass)
        9334212/24963876 (3.17 GiB/28.43 GiB)
    In/out packets (block)
        3286/7232 (301 KiB/493 KiB)
    In/out errors
        0/0
    Collisions
        0
    

    `


  • Netgate Administrator

    Is it an MTU issue somewhere perhaps? DNS being relatively small.
    Can you ping across it? At arbitrary packet sizes?

    Steve



  • I think you're onto something. This is messed up!!! On igb0 I can send pings as high as 1460 bytes. On cxl0 it craps out somewhere around 100-120.

    Using ppoe0 on cxl0 MTU 1492

    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1500 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 1500 data bytes
    36 bytes from localhost (127.0.0.1): frag needed and DF set (MTU 1492)
    Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
     4  5  00 05f8 0000   0 0000  40  01 2f03 64.229.252.100  4.2.2.1
    
    ^C
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 0 packets received, 100.0% packet loss
    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1460 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 1460 data bytes
    ^C
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 0 packets received, 100.0% packet loss
    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 100 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 100 data bytes
    108 bytes from 4.2.2.1: icmp_seq=0 ttl=57 time=15.933 ms
    
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 1 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 15.933/15.933/15.933/0.000 ms
    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 120 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 120 data bytes
    ^C
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 0 packets received, 100.0% packet loss
    
    

    Using pppoe0 on igb0 MTU 1492

    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1500 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 1500 data bytes
    36 bytes from localhost (127.0.0.1): frag needed and DF set (MTU 1492)
    Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
     4  5  00 05f8 0000   0 0000  40  01 2f03 142.113.146.2  4.2.2.1
    
    ^C
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 0 packets received, 100.0% packet loss
    [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1460 4.2.2.1
    PING 4.2.2.1 (4.2.2.1): 1460 data bytes
    1468 bytes from 4.2.2.1: icmp_seq=0 ttl=51 time=17.066 ms
    
    --- 4.2.2.1 ping statistics ---
    1 packets transmitted, 1 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 17.066/17.066/17.066/0.000 ms
    
    

  • Netgate Administrator

    Ok, well that's good and bad I guess. 😉

    Check the ifconfig -a output. What's the MTU on the cxl interface?

    Steve



  • MTU on the cxl interface is 1500 and the PPPoE interface is 1492. Did a bit more testing. The specific point where it drops out is 117(145) bytes of data. ping -s 116 works but no higher.



  • Just tried pfsense 2.4.3 which has a slightly older cxgbe driver and firmware. No change :( Also tried forcing PCIe Gen3 link in the BIOS which fixed a vaguely similar issue for someone else, no luck. I did confirm the card was already linking at PCIe 3.0 x8 to begin with.



  • Popped in a Myricom 10G card and it works fine. Seems clear that there is a problem with the FreeBSD Chelsio drivers when used with PPPoE. Chelsio card works fine in Ubuntu, Myricom card works fine in FreeBSD.

    Any thoughts on next steps?



  • @mikenemat said in Chelsio T520 not working as WAN interface:

    Any thoughts on next steps?

    Contact the FreeBSD developers.



  • @grimson said in Chelsio T520 not working as WAN interface:

    @mikenemat said in Chelsio T520 not working as WAN interface:

    Any thoughts on next steps?

    Contact the FreeBSD developers.

    Hoping to find some obscure explanation and workaround for the problem. Really struggling to accept that this is a driver bug, despite all signs pointing to it. I can't find anyone else who's run into this problem. This is the Netgate recommended 10GB NIC for Pfsense so you'd think if it is a problem with the driver, it would be widespread. PPPoE is common with many FTTH providers.



  • @mikenemat said in Chelsio T520 not working as WAN interface:

    PPPoE is common with many FTTH providers.

    Maybe in your area, but not on a global scale. And certainly not with 10GB.



  • Replaced the T520 with an Intel X520-DA2. Worked perfectly the first time. Chelsio = garbage.


  • Netgate Administrator

    It's certainly very unusual to be using a 10G card of any type for PPPoE. Do you really have a >1Gbps connection using that?

    You could well be the only person using it in which case any bugs that might exist for that combination simply may not have been discovered, until now.

    The next step here would be to test it with FreeBSD 11.2 and if it fails there report it upstream.

    Steve



  • @stephenw10

    Yep. I really am :) Bell Canada's new 1.5gbps FTTH service works over PPPoE. Apparently this is very common in Japan as well. Their supplied equipment links a SFP GPON module at 2.5gbps. However, their supplied equipment is junk and aside from many other issues, the only way to reach 1.5gbps is to use the wired 1gbps LAN interface and the wireless interface simultaneously, since it is not a 10G appliance. This is not ideal.

    So in order to move past this, I'm using a ubiquiti ES-16-XG to link the SFP GPON module. The ES-16-XG is one of the rare pieces of hardware to support linking a SFP module at 2.5gbps. I've configured it to strip the VLAN tag (35), and pass it to pfsense over a proper 10G link. At that point, I'm able to establish a PPPoE session and utilize the full 1.5gbps over a 10G LAN interface.

    Forgive my frustration - this issue has been a nightmare to troubleshoot and my appetite for taking down my internet connection for hours at a time (which I use for business as well) is dwindling. I'm going to resell the Chelsio card to someone who hopefully has a more conventional use case. The Intel X520-DA2 has been working flawlessly for over 24 hours now. I think I prefer the Intel card anyways, much less firmware-driven magic going on.

    0_1539270056866_speedtest.png


  • Netgate Administrator

    I understand. Interesting use case.

    Steve