Qotom J1900 4 port mini-pc getting WAN errors in on a 1 gig fiber connection
-
Hey all,
I just took delivery of one of those Qotom J1900 units that has 4 Intel WG82583 NIC's.It was this model: https://www.amazon.com/QOTOM-Intel-celeron-processor-Windows/dp/B01CSCGID0/ref=sr_1_3?s=pc&ie=UTF8&qid=1472359925&sr=1-3&keywords=qotom
My old hardware was a Lanner FW unit that has an i3 and 4 gig ram. It has always been able to route full gigabit speeds without any errors on the interface or any issues.
This new Qotom unit will only get about 850 - 900 Mbps on average at the best, and I am seeing a TON errors on the WAN in interface statistics.
I did some testing, setup two networks, 172.16.100.0/24 on LAN and another 172.16.200.0/24 on OPT1. The WAN is set to DHCP per my ISP. I put a host in each of the LAN and OPT1 networks, and iperf between those two will show 980 Mbps and have zero interface errors (regardless of which host is running as server or client).
BUT, the moment I try to do a speedtest.net speedtest, I can only get 800 or so, and I will see thousands of errors pop up during the speedtest. I thought it might be a bad NIC that WAN was using, so I decided to test that. I swapped WAN to be on all 4 of the interfaces (em0, em1, em2, and em3) and it didn't matter what interface is was on, I would always see WAN in errors when running speedtest.net
The other odd thing, I can use a few other speed test sites (fast.com, testmy.net, etc) and I do not get the errors on my WAN, but of course, their servers are hardly ever capable of sending a full 1gig download to me.
I've attached some screenshots of my old hardware (the Intel i3 processor) and my new Qotom (the J1900 processor) showing the errors. These screenshots were taken close together, and after swapping the pfsense boxes around.
Has anyone else seen a huge amount of errors from WAN in on these boxes using speedtest.net at 1 gig speeds? What could be causing this?
Here's the screenshots.
This is the J1900 after only about 4 gig of transfer across the WAN: http://imgur.com/i6Q3S9Q.jpg
This is my old Core i3 hardware with about the same amount of transfer (both of these were speedtest.net data): http://imgur.com/9D93M8Z.jpg -
The other odd thing, I can use a few other speed test sites (fast.com, testmy.net, etc) and I do not get the errors on my WAN, but of course, their servers are hardly ever capable of sending a full 1gig download to me.
Try https://www.dslreports.com/speedtest. That should give you a full 1Gbps and at least let you know whether it's just related to throughput.
-
I've recently experienced the same exact behavior with a different FANLESS PC - Fitlet - and I went through the same troubleshooting routine as you did.
Please read the thread I started in the hardware forum, and you will see how to eliminate the errors you are seeing when routing LAN-to-WAN.
I'm actually surprised you can get this type of throughput out of your box. The maximum I could squeeze out of mine was 600 Mbps LAN-to-WAN, and 650 Mbps WAN-to-LAN based on iperf3 tests.
The CPU in my box seems to be a little more powerful than in yours, so I guess you are lucky you are getting that type of throughput. if you follow the steps that I took, you may improve your throughput and you will definitely eliminate the input errors on the interfaces. Also, read the pfSense page on tuning NICs as your NICs may use different driver than mine. Mine use igb, and yours may use em. But, conceptually, you will have to do the same things that I described in that thread, except the syntax may be a little different. You can get the correct syntax for your NICs from the document that pfSense published on tuning NICs.
-
The CPU in my box seems to be a little more powerful than in yours
I don't think that's true in general. The one edge it has is AES-NI but that doesn't matter unless you're comparing crypto performance. The Intel CPU has higher clock speed (both base and turbo) and will at least match the AMD in IPC, if not exceed it. That alone can explain the difference in LAN to WAN throughput.
-
You may want to read a review of my box on Anandtech and also run a comparison of these two CPUs using one (or more) of the sties that offer that information.
-
Some tunables here : https://ashbyte.com/ashbyte/wiki/pfSense/Tuning
-
I have tried the tunables suggested and I am still getting errors and only about 850 max throughput across the WAN. The CPU will spike to about 70% at the most, and its hardly using any RAM.
I suspect that if I can eliminate the errors, I'd see a marked increase in throughput.
-
Did you configure flow control in pfSense? If you did, you need to configure flow control under the port on the LAN switch port connected to the pfSense LAN interface. With flow control configured in pfSense (don't forget to reboot pfsense) pfSense will be sending flow control messages to the upstream switch port to pause sending traffic, so the switch port must understand and obey these flow control messages. This will stop input errors on the LAN interface. In my case, once I configured flow control in pfSense, input errors also stopped incrementing on the pfSense WAN interface. Currently pfSense is still in the lab environment, being tested, so a Mac Mini is connected directly to the WAN interface for running iperf3. Therefore, macOS seems to have flow control enabled by default. I did, however have to enable flow control in the Cisco switch port connected to the pfSense LAN interface for the input errors on the pfSense LAN interface to stop.
As for the CPU utilization, when I do iperf3 through the pfSense, the CPU utilization in the box spikes to about 50%. However, when I run iperf3 to the pfSense LAN interface IP, the CPU utilization spikes to 70% utilization.
In my opinion, you are getting good throughput. You need to elminate input errors, though.
-
I have no control of the upstream flow control settings, as its an ethernet that comes from the demarc box that has the fiber to ethernet media converter supplied by my ISP. All of my errors are coming from the WAN in interface.
On my old hardware, I do not have to rely on flow control. I can routinely see over 930 Mbps throughput on my old hardware with no tunings.
-
I have no control of the upstream flow control settings, as its an ethernet that comes from the demarc box that has the fiber to ethernet media converter supplied by my ISP. All of my errors are coming from the WAN in interface.
On my old hardware, I do not have to rely on flow control. I can routinely see over 930 Mbps throughput on my old hardware with no tunings.
I guess this is the price to pay for getting a cheap box to run pfSense.. The weak CPU cannot keep up with the input packets flodding the input buffers of the gigabit NICs. I have the same problem with my Fitlet box and am considering returning it and getting a Check Point instead since it seems I would have to pay the same amount for a similar performance check point.
-
You may want to read a review of my box on Anandtech and also run a comparison of these two CPUs using one (or more) of the sties that offer that information.
I did. Not trying to impugn the device, just suggesting why OP may have higher throughput (errors notwithstanding). I'm in a similar boat; running a low power AMD SoC and even LAN to LAN throughput seems suspiciously low, at around 700Mbps with iperf. Even my lowly Sheevaplug (1.2GHz single core ARM chip from 2009) can do that. Fortunately my WAN speed is low enough that I don't have to deal with what you guys are running up against. Best of luck getting it figured out.
-
I did a sysctl -a on the em.0 device:
And have been noticing these two:
dev.em.0.mac_stats.recv_no_buff: 4876 dev.em.0.mac_stats.missed_packets: 2019
Which indicates the incoming packets are possibly flowing in too fast for this unit to handle. In the dashboard, the interface statistics will report the dev.em.0.mac_stats.missed_packets for the interface.
Any suggestions on how to tune these bufffers?
Here is the full output of the sysctl -a |grep em.0
dev.em.0.wake: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_abs_timer: 8 dev.em.0.interrupts.tx_pkt_timer: 2 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.rx_pkt_timer: 690 dev.em.0.interrupts.asserts: 2858628 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.tx_frames_1024_1522: 2879367 dev.em.0.mac_stats.tx_frames_512_1023: 33878 dev.em.0.mac_stats.tx_frames_256_511: 47852 dev.em.0.mac_stats.tx_frames_128_255: 59479 dev.em.0.mac_stats.tx_frames_65_127: 2185545 dev.em.0.mac_stats.tx_frames_64: 210228 dev.em.0.mac_stats.mcast_pkts_txd: 5 dev.em.0.mac_stats.bcast_pkts_txd: 42 dev.em.0.mac_stats.good_pkts_txd: 5416349 dev.em.0.mac_stats.total_pkts_txd: 5416349 dev.em.0.mac_stats.good_octets_txd: 4593379613 dev.em.0.mac_stats.good_octets_recvd: 11719132980 dev.em.0.mac_stats.rx_frames_1024_1522: 7626467 dev.em.0.mac_stats.rx_frames_512_1023: 72768 dev.em.0.mac_stats.rx_frames_256_511: 54886 dev.em.0.mac_stats.rx_frames_128_255: 94095 dev.em.0.mac_stats.rx_frames_65_127: 1216069 dev.em.0.mac_stats.rx_frames_64: 157708 dev.em.0.mac_stats.mcast_pkts_recvd: 0 dev.em.0.mac_stats.bcast_pkts_recvd: 1428 dev.em.0.mac_stats.good_pkts_recvd: 9221993 dev.em.0.mac_stats.total_pkts_recvd: 9224012 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_no_buff: 4876 dev.em.0.mac_stats.missed_packets: 2019 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.queue_rx_0.rx_irq: 0 dev.em.0.queue_rx_0.rxd_tail: 873 dev.em.0.queue_rx_0.rxd_head: 874 dev.em.0.queue_tx_0.no_desc_avail: 0 dev.em.0.queue_tx_0.tx_irq: 0 dev.em.0.queue_tx_0.txd_tail: 801 dev.em.0.queue_tx_0.txd_head: 801 dev.em.0.fc_low_water: 16932 dev.em.0.fc_high_water: 18432 dev.em.0.rx_control: 67141658 dev.em.0.device_control: 1074790984 dev.em.0.watchdog_timeouts: 0 dev.em.0.rx_overruns: 3 dev.em.0.tx_dma_fail: 0 dev.em.0.mbuf_defrag_fail: 0 dev.em.0.link_irq: 0 dev.em.0.dropped: 0 dev.em.0.eee_control: 1 dev.em.0.rx_processing_limit: 100 dev.em.0.itr: 488 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_int_delay: 66 dev.em.0.rx_int_delay: 0 dev.em.0.fc: 3 dev.em.0.debug: -1 dev.em.0.nvm: -1 dev.em.0.%parent: pci1 dev.em.0.%pnpinfo: vendor=0x8086 device=0x150c subvendor=0x8086 subdevice=0x0000 class=0x020000 dev.em.0.%location: pci0:1:0:0 handle=\_SB_.PCI0.RP01.PXSX dev.em.0.%driver: em dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.6.1-k
-
I did a sysctl -a on the em.0 device:
And have been noticing these two:
dev.em.0.mac_stats.recv_no_buff: 4876 dev.em.0.mac_stats.missed_packets: 2019
Which indicates the incoming packets are possibly flowing in too fast for this unit to handle. In the dashboard, the interface statistics will report the dev.em.0.mac_stats.missed_packets for the interface.
Any suggestions on how to tune these bufffers?
I've used these settings in the /boot/loader.conf.local file.
kern.ipc.nmbclusters=1000000 hw.pci.enable_msix=0 hw.igb.fc_setting=2 hw.igb.rxd=4096 hw.igb.txd=4096
You must restart your pfSense box after you configure these commands and save the file.
The hw.igb.rxd=4096 command (after I rebooted the pfSense box) eliminated the "recv_no_buff" errors.
You will have to replace .igb. with .em. in the above commands.
-
Well, I gave that a shot (replacing with em) and didnt see much of an improvement.
However, I noticed that when I enable trim on my SSD, the WAN in errors seem to have decreased by a margin, but are still there. Any other suggestions on tuning this thing?