Chelsio T520 not working as WAN interface
Vanilla Pfsense 2.4.4 installation on amd64 arch.
I'm using a Chelsio T520-LL-CR 2x SFP+ 10G NIC with one port as my WAN interface and the other as LAN. The WAN IP is obtained through PPPoE. Whenever I use the Chelsio card for WAN/PPPoE, I can dial a PPPoE session, get an IP, and make DNS queries, but absolutely no TCP traffic goes through. Packets go out, but no replies come back. No issues when using onboard Intel NIC or when using the Chelsio card in Ubuntu.
T520 is connected to a Ubiquiti ES-16-XG using certified DAC cables, 10g link is stable
LAN works fine on either port, WAN works on neither.
Using Intel NIC instead of Chelsio works fine for WAN+PPPoE
Booting into Ubuntu and dialing a PPPoE session with the Chelsio card works fine.
Switching to SFP+ optical instead of DAC
Alternating ports on the switch
Alternating ports on the card
Disabling/enabling virtually all combinations of offload / checksum acceleration features
I am convinced it is a problem with Pfsense and/or the FreeBSD drivers. I can sit at the Pfsense console for hours flipping PPPoE between igb0 and cxl0, watching igb0 work fine and cxl0 not work, all the mean time cxl1 is working fine for LAN. Any help would be greatly appreciated!
Double check you have a default route via the PPPoE gateway when you use the Chelsio WAN.
You might have a static route to DNS servers allowing that to work whilst other do not.
If you don't set the PPPoE connection as default specifically in System > Routes rather than automatic.
Thanks for the suggestion. I took a quick peek at the routes and gateways and everything looks good. I explicitly selected WAN_PPPOE as the default ipv4 gateway in System->Routes, no change.
Routing tables Internet: Destination Gateway Flags Netif Expire default 10.11.1.113 UGS pppoe0 10.11.1.113 link#9 UH pppoe0 126.96.36.199 link#9 UHS lo0 127.0.0.1 link#6 UH lo0 192.168.1.0/24 link#1 U cxl0 192.168.1.1 link#1 UHS lo0
And this is the output of the PPPoE connection info while using the Chelsio card. It looks identical to the Intel NIC, except the fact it isn't working of course. In/out packet counts are high because the stats do not appear to be reset when changing between parent interfaces (I had igb0 running for 7-8 hours racking up the packet counts).
Status up PPPoE up Uptime 00:03:06 IPv4 Address 50.101.73.xxx Subnet mask IPv4 255.255.255.255 Gateway IPv4 10.11.1.113 IPv6 Link Local fe80::7285:c2ff:fe80:dcd3%igb0 DNS servers 127.0.0.1 188.8.131.52 184.108.40.206 MTU 1492 In/out packets 9334212/24963876 (3.17 GiB/28.43 GiB) In/out packets (pass) 9334212/24963876 (3.17 GiB/28.43 GiB) In/out packets (block) 3286/7232 (301 KiB/493 KiB) In/out errors 0/0 Collisions 0
Is it an MTU issue somewhere perhaps? DNS being relatively small.
Can you ping across it? At arbitrary packet sizes?
I think you're onto something. This is messed up!!! On igb0 I can send pings as high as 1460 bytes. On cxl0 it craps out somewhere around 100-120.
Using ppoe0 on cxl0 MTU 1492
[2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1500 220.127.116.11 PING 18.104.22.168 (22.214.171.124): 1500 data bytes 36 bytes from localhost (127.0.0.1): frag needed and DF set (MTU 1492) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 05f8 0000 0 0000 40 01 2f03 126.96.36.199 188.8.131.52 ^C --- 184.108.40.206 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1460 220.127.116.11 PING 18.104.22.168 (22.214.171.124): 1460 data bytes ^C --- 126.96.36.199 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 100 188.8.131.52 PING 184.108.40.206 (220.127.116.11): 100 data bytes 108 bytes from 18.104.22.168: icmp_seq=0 ttl=57 time=15.933 ms --- 22.214.171.124 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 15.933/15.933/15.933/0.000 ms [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 120 126.96.36.199 PING 188.8.131.52 (184.108.40.206): 120 data bytes ^C --- 220.127.116.11 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss
Using pppoe0 on igb0 MTU 1492
[2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1500 18.104.22.168 PING 22.214.171.124 (126.96.36.199): 1500 data bytes 36 bytes from localhost (127.0.0.1): frag needed and DF set (MTU 1492) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 05f8 0000 0 0000 40 01 2f03 188.8.131.52 184.108.40.206 ^C --- 220.127.116.11 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss [2.4.4-RELEASE][admin@pfSense.localdomain]/root: ping -c 1 -D -s 1460 18.104.22.168 PING 22.214.171.124 (126.96.36.199): 1460 data bytes 1468 bytes from 188.8.131.52: icmp_seq=0 ttl=51 time=17.066 ms --- 184.108.40.206 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 17.066/17.066/17.066/0.000 ms
Ok, well that's good and bad I guess.
ifconfig -aoutput. What's the MTU on the cxl interface?
MTU on the cxl interface is 1500 and the PPPoE interface is 1492. Did a bit more testing. The specific point where it drops out is 117(145) bytes of data. ping -s 116 works but no higher.
Just tried pfsense 2.4.3 which has a slightly older cxgbe driver and firmware. No change :( Also tried forcing PCIe Gen3 link in the BIOS which fixed a vaguely similar issue for someone else, no luck. I did confirm the card was already linking at PCIe 3.0 x8 to begin with.
Popped in a Myricom 10G card and it works fine. Seems clear that there is a problem with the FreeBSD Chelsio drivers when used with PPPoE. Chelsio card works fine in Ubuntu, Myricom card works fine in FreeBSD.
Any thoughts on next steps?
Grimson Banned last edited by
Any thoughts on next steps?
Contact the FreeBSD developers.
Hoping to find some obscure explanation and workaround for the problem. Really struggling to accept that this is a driver bug, despite all signs pointing to it. I can't find anyone else who's run into this problem. This is the Netgate recommended 10GB NIC for Pfsense so you'd think if it is a problem with the driver, it would be widespread. PPPoE is common with many FTTH providers.
Grimson Banned last edited by
Replaced the T520 with an Intel X520-DA2. Worked perfectly the first time. Chelsio = garbage.
It's certainly very unusual to be using a 10G card of any type for PPPoE. Do you really have a >1Gbps connection using that?
You could well be the only person using it in which case any bugs that might exist for that combination simply may not have been discovered, until now.
The next step here would be to test it with FreeBSD 11.2 and if it fails there report it upstream.
Yep. I really am :) Bell Canada's new 1.5gbps FTTH service works over PPPoE. Apparently this is very common in Japan as well. Their supplied equipment links a SFP GPON module at 2.5gbps. However, their supplied equipment is junk and aside from many other issues, the only way to reach 1.5gbps is to use the wired 1gbps LAN interface and the wireless interface simultaneously, since it is not a 10G appliance. This is not ideal.
So in order to move past this, I'm using a ubiquiti ES-16-XG to link the SFP GPON module. The ES-16-XG is one of the rare pieces of hardware to support linking a SFP module at 2.5gbps. I've configured it to strip the VLAN tag (35), and pass it to pfsense over a proper 10G link. At that point, I'm able to establish a PPPoE session and utilize the full 1.5gbps over a 10G LAN interface.
Forgive my frustration - this issue has been a nightmare to troubleshoot and my appetite for taking down my internet connection for hours at a time (which I use for business as well) is dwindling. I'm going to resell the Chelsio card to someone who hopefully has a more conventional use case. The Intel X520-DA2 has been working flawlessly for over 24 hours now. I think I prefer the Intel card anyways, much less firmware-driven magic going on.
I understand. Interesting use case.
We are seeing a similar issue but not related with pppoe.
WAN on the chelsio seems to work ok in our case via DAC on 10G.
If we dial in via ipsec the ipsec connection is established but partly blocked.
So we can ping the pfsense lan interface and other devices on the LAN but not reach the webinterfaces etc.
Switching the WAN from the chelsio to the onboard intel RJ45 solves the problem.
Same issue on two identical hardware setups.
After pfctl -d access was possible but ipsec allow any rule was present.
Hmm, if you're coming over IPSec then it's not a TCP off-loading issue. Perhaps an MTU issue though. Try setting enabling mss clamping on IPSec.
Perhaps an MTU issue though. Try setting enabling mss clamping on IPSec.
We are not onsite today but I will check MTU settings.
The strange thing is, that disabling pf resolved the issue with blocking access. This shouldn't change MTU settings?
Are you guessing it is MTU related because ping works and probably bigger packets are blocked/dropped?
edit: MTU on cxl (chelsio 10G) and ix (intel 1G) are on 1500 but enc0 is on 1536.
Yes large packets would be my first thought. Disabling pf wouldn't affect that though.
However it does disable pf-scrub. Perhaps you have packet fragments that are not being assembled correctly. Maybe the Chelsio card is doing something with them, it has all sorts of offloading hardware that we don't use.
You can disable pfscrub separately in System > Advanced > Firewall & NAT.
Hello I am having a very similar issue! I have a chelsio t404-bt and I am using Centurylink. Which WAN tags vlan201 and pppoe. This setup worked fine with 2 intel nics. Now with the chelsio card I get an WAN IP and I can traceroute from pfsense out but I have no internet on my devices.
Any solution to this?
Do you have other connectivity from pfSense itself? Can it check for updates or install packages for example?
If so it's a different issue. Probably no NAT.
@stephenw10 no I cannot get a list of packages either on pfsense
Ok so you can ping out though?
Try pinging out with large packets:
ping -s 1000 -c 3 220.127.116.11
Try different sized packets to see if you really are seeing an MTU issue.