C3K NIC with QinQ interfaces really slow upload

yswery

We have C3K NIC - ix(4) running on pfsense 2.4.4-p2

Everything works fine, BUT, we have a few QinQ interfaces set up for downstream DSL/Fibre clients and seems that they are having issues with super slow upload speeds. and by that I mean on a 100/100 connection they get 100mbps down and 0.2mbps up and constant "drop outs"

I saw there is on the https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html page a mention of thew ix(4) NICS

I see that

kern.ipc.nmbclusters="1000000"
kern.ipc.nmbjumbop="524288"

is already in /boot/loader.conf

This QinQ set up worked fine on our other pfsense box (with some crappy Realtek card) but with this it seems really bad and weird and inconstant.

Does anyone have the ability to point me the right direction of what is the issue and what I can try to potentially resolve this?

stephenw10

If that's the only difference there I'd look at hardware offloading. Specifically vlan tagging:

ix2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

re2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>

VLAN_HWSO maybe looking at what I have here.

Steve

yswery

Thanks for the reply @stephenw10 are you talking about the "Hardware Checksum Offloading" and "Hardware TCP Segmentation Offloading" in the UI? or is there something else.

My current options used

re0: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
	options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>

and

ix3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

Edit: my NICs:

re0@pci0:2:0:0:	class=0x020000 card=0x34687470 chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

and

ix3@pci0:7:0:1:	class=0x020000 card=0x00008086 chip=0x15e48086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X553 1GbE'
    class      = network
    subclass   = ethernet

And Drivers loaded

grep ix3 /var/run/dmesg.boot
ix3: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xdd200000-0xdd3fffff,0xdd600000-0xdd603fff at device 0.1 on pci7
ix3: Using MSI-X interrupts with 9 vectors
ix3: Ethernet address: ac:1f:6b:b1:d8:af
ix3: netmap queues/slots: TX 8/2048, RX 8/2048```

stephenw10

You can disable/enable any of those options manually only those that have proved an issue in the past are in the GUI.

However it looks like you have LRO enabled on the ix NIC and that's something we disable by defaullt. I have it disabled there on the 5100, which is C3K. That is a GUI option so disable that first.

Steve

yswery

Sorry for necro-ing this thread, but after having not gotten anywhere with this issue 3 years ago we decided to leave the Realtek card in for our QinQ clients. And it has worked great till now where we had to remove it in order to free up the PCI slot.

So now we moved our set up back to the X553 interface once again and the exact same problem (this time on pfsense 2.6) came back

What we are finding even weirder is that our client's smart TVs or home assistant IoT devices arent working but only to services that communicate with AWS. (Alexa, Netflix, Slack - All connecting to AWS)

I can 100% confirm this is not IP address blocking or anything on AWS side or even firewall rules on our end, I assume its some packet size or MTU since its only to the QinQ clients, could this be? or am I going crazy and totally not looking in the right place

tldr, Home smart Tv with apps that connect to AWS seem to not work on our C3K NIC while they DO on a $15 realtek NIC, how can this be?

ix4: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
	options=8038b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>

vs

re0: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
	options=82098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>

stephenw10

Those are just services that happen to be hosted in AWS? You're not connecting over IPSec for example?

Can you actually see dropped packets if you run a pcap?

Do you see a reduced MTU on the QinQ NIC?

igb8.201: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether 00:90:7f:db:ca:ae

igb8.201.301: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496
	description: QinQ0
	options=4000000<NOMAP>
	ether 00:90:7f:db:ca:ae

QinQ is now supported directly by the vlan(4) driver so in the next version will not use netgraph which changes a lot. Are you able to test a 2.7 snapshot when they become available? There will be some things broken in it initially. But hopefully not QinQ.

Steve

yswery

@stephenw10

In regards to AWS thing, its the only common thing that 'doesnt work' everything still seems functional at a glance, but I might be reading too much into it just to see some pattern. For example, a Smart TV works with Diseny+ while it doesnt with Prime and Netflix, and this is tested on 4 different Smart TVs with different end customer CPEs. Same for the Alexa devices.

I dont have any IPSec or VPN-ing used here at all, not even PPPoE encapsulation

The QinQ interfaces still all show 1500 MTU, should I override this with 1496 in the webui?

ix4: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
	options=8038b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
	ether ac:1f:6b:b1:d8:ae
	inet6 fe80::ae1f:6bff:feb1:d8ae%ix4 prefixlen 64 scopeid 0x5
	media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
	status: active

ix4.20: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	inet6 fe80::ae1f:6bff:feb1:d8ae%ix4.20 prefixlen 64 scopeid 0x12
	groups: vlan
	vlan: 20 vlanpcp: 0 parent interface: ix4
	media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

ix4.20.24: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=28<VLAN_MTU,JUMBO_MTU>
	groups: QinQ
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

stephenw10

@yswery said in C3K NIC with QinQ interfaces really slow upload:

should I override this with 1496 in the webui?

Yes, I would try that. The data I showed there is from a 2.7 image using the vlan driver.

Though I would expect almost anything to support 1508B packets.

Steve

yswery

@stephenw10 said in C3K NIC with QinQ interfaces really slow upload:

Yes, I would try that. The data I showed there is from a 2.7 image using the vlan driver.

Tried this (among changing the MTU on the QinQ interface to something like 1400) and the IoT devices start to work hwoever with that a whole load of other websites or services stopped. Changing back to the default black (1500MTU) resulted again in the IoT devices + AWS combo not working.

Other cases that I noticed is that if I VPN into one of our client's CPEs everything works all ok, which makes me feel like its an MTU issue somewhere. Is it worth getting the client CPE MTU settings changed to match?

Other than waiting on pfsense 2.7 to test with, anything might come to mind that could be causing this on these interface while the cheapo realtek nic was a-okay for 3 years? (same pfsense box same configs)

stephenw10

Some hardware off-loading is always my first suspect. The Realtek won't have had nearly as many off-loading options and most of them should have been disabled.

You could try setting the MTU the other way. So maybe set ix4 to 1508.

Or try setting MSS instead of MTU if that's not being detected correctly.

Are you able to prove the reduced PMTU with large ping packets?

Are you preventing PMTU detection by blocking icmp?

Steve

yswery

@stephenw10 said in C3K NIC with QinQ interfaces really slow upload:

You could try setting the MTU the other way. So maybe set ix4 to 1508.

Ok this (simple) adjustment fixed everything, all reports from before are now gone and all devices (seem) to be working once again! thank you so much for the help, like you said I do susspend that v2.7 of pfsense will probably work out of the box due to there being adjustments with MTUs out of the box on parent interface(s) but when the time comes ill keep an eye out still. For now MTU 1508 on parent interface results in everything working again

With PMTU, we dont do any ICMP blocking at al on our end, however all those "AWS Services" I mentioned, non of the dest IPs reply to ICMP at all (could be the trigger for the situation?)

Regardless, thank you again for the suggestion, which ended up fixing the issue!

stephenw10

Ah, that's good to know. I'll have to test that in the new setup in 2.7 without netgraph.

Steve