Jetway JNF9D-2550 + Jetway AD3RTLANG 3 x Gigabit LAN Port Daughter Board

Guest

I did the original install with 2.0.1, looks like that was fixed in the 2.1 beta release, from reading that report.

IuZ

Hello,
me too I have the same problem on pfSense 2.0.2.

I saw the link http://redmine.pfsense.org/issues/2595 and the other http://freshbsd.org/commit/freebsd/r237203

My question is:
Can it work if I change the file head/sys/dev/fb/fbreg.h, as in the link above, on my pfSense 2.0.2?
Thanks, bye.

seedrs

See http://forum.pfsense.org/index.php/topic,58150.0.html to fix the issue with the onboard NICs not showing up in 2.0.2. The solution is for i386 only

IuZ

I saw that link (thank you the same), but I have x64 version. :(

seedrs

I have compiled the driver for x64 systems. You can download it here http://forum.pfsense.org/index.php/topic,58150.msg311249.html. I have not tested it since my router is currently running i386 build.

IuZ

Really, thank you very much I'll try.
Thanks

IuZ

Update: I think that your x64 drivers works… 'cause from the pfSense interface I can see my other 2 NICs!! Thanks again! :)

seedrs

@IuZ:

Update: I think that your x64 drivers works… 'cause from the pfSense interface I can see my other 2 NICs!! Thanks again! :)

You are welcome, I'm glad it worked.

elgo

Hi,
I'm running the exact same hardware setup (well, CPU frequency appart, it's a NF9D-2700+AD3RTLANG), and I'm experiencing very nasty things with both onboard Realtek 8111EVL NICs (let's call them 8111EVIL). When network activity goes up, they completly go mad and error message shows "reX: watchdog timeout" with link going up and down continuously.

I first check cables and all, and checked that nor BIOS nor hardware was in cause: I can reproduce the problem with 100% success on pfsense beta 2.1 (so with the re driver from FreeBSD 8.3) on i386 and amd64. OpenWRT (linux 3.3.8 kernel) doesn't seem to show the symptom (although the test slightly differed, the pfsense box was recieving traffic, not the desktop):

  +-----pfsense---+
  |		  |
8111EVL		8110SC
  |		  |
 DMZ		 LAN
  |		  |
server	   switch GS108Tv2
		  |	
	       Desktop

I generate traffic from the server to the desktop box (using nc on both machine). This traffic saturate the gigabit link, and immediatly the DMZ NIC (so the 8111EVIL interface) goes "watchdog timeout".
If I do generate traffic from desktop to server, it won't trigger it (same commands).

It's not only a matter of raw throughtput (as for packet per second, I don't know) because my WAN interface which is the other 8111EVIL NIC and is connected to a 100Mbps fiber transciever suffers from this too (and traffic can vary from 20Mbps to 60Mbps in both way, "watchdog" symptom seems really random).

What I tried:
Disabling MSI/MSI-X: no impact.
Disabling all "hardware accelerated" stuff: no impact.

My question is: what can I try next? What data can I gather if I can help for a freebsd re driver fix?

drewy

Not terribly constructive I know but I paid the extra and got the intel daughter card. May be a cost effective fix, for 3 decent ports.

elgo

@drewy:

Not terribly constructive I know but I paid the extra and got the intel daughter card. May be a cost effective fix, for 3 decent ports.

The 3 realtek ports on daughter-board are fine (8110SC). Used them for years (had them on a previous jetway mobo that "died").
The 2 realtek ports on the mother board are causing trouble (8111EVL).

wallabybob

@elgo:

I'm experiencing very nasty things with both onboard Realtek 8111EVL NICs (let's call them 8111EVIL). When network activity goes up, they completly go mad and error message shows "reX: watchdog timeout" with link going up and down continuously.

I SUSPECT your 8111EVL devices are PCI-Express devices and the devices on the daughter card are PCI devices sharing a single PCI bus. It is likely then that the 8111EVL devices can accept data at a much higher rate than it leave the box out the daughter card devices which are limited by the PCI bus speed. Consequently data going out the daughter card devices (under heavy load) will "bank up" on the transmit queue of the daughter card device and (if the driver is dumb about this) the transmit watchdog timer will go off because it takes "too long" for the transmit queue to empty.

Data from the DMZ to the internet could suffer a similar problem, not because of bus speed limitations but because the channel speed to the internet is much slower than the channel speed to the server.

Are you interested in doing some experiments to attempt to tune things a bit better? I suggest reassigning your interfaces so the daughter card provides the pfSense WAN interface and the two onboard devices are used for LAN and DMZ. Then try your test again from DMZ to LAN and see if any watchdog timeouts are reported and note the interface reporting them.

Please also post the output of pfSense shell commands:```
dmesg
pciconf -l
sysctl -a | grep re
/etc/rc.banner

The re driver MIGHT provide some tunables that could be adjusted to attempt to stop the messages.

@elgo:

> I generate traffic from the server to the desktop box (using nc on both machine). This traffic saturate the gigabit link,

What sort of traffic is this? I presume some sort of datagram traffic (UDP?) rather than TCP.

@elgo:

> If I do generate traffic from desktop to server, it won't trigger it (same commands).

In that direction incoming data rate is limited by the PCI bus and will be lower than the rate at which traffic can leave the system over the PCI Express bus so is unlikely to back up on the transmit queue.

Guest

I don't know if this will help you.
I have run the iperf on both the onboard and daughter network ports, as a server, I was maxing out at about 500 Mbps.
The CPU was almost at 100%, I assume that is where the slowdown was.

Have you tried an internal iperf test from a PC on a gigabit network?

elgo

Thank you for your interest, gentlemen.
@wallabybob: you are totally right about PCIe/8111EVL and PCI/8110SC.
As for re driver tunable, I tried MSI/MSIX and intr_filter but not hw.re.prefer_iomap which I don't understand. No luck. Though I see a dev.re.%d.int_rx_mod I didn't test… well, can be worth the try.
As for the shell commands, I'll post their output soon, as for now the pfsense box has been temporary taken out of "the production network" :).

Yesterday I simplified my test case (which may be what you thought about, JoeMcJoe):

  +-----pfsense---+
  |		  
8111EVL		
  |		  
 DMZ		 
  |		 
server

I generate traffic from the server (TCP traffic or UDP with -u option, it doesn't affect results: cat /dev/zero | nc -vv -u pfsense_IP pfsense_port) and recieve it on pfsense box (nc -l other_options_i_dont_remember > /dev/null).
"reX: watchdog timeout" visible on dmesg output on pfsense box and on its VGA console.
Same test with pfsense generating traffic and server recieving it won't trigger this error message.
During this test, no CPU core on pfsense box was used at 100% (maybe a core at 70% max by NIC related "intr" thread. Remember it's a 4 cores as it's a hyperthreaded dual atom).

So the PCIe/PCI rate problem was a smart guess, but It's not involved in this test case.

elgo

@wallabybob:

Please also post the output of pfSense shell commands:```
dmesg
pciconf -l
sysctl -a | grep re
/etc/rc.banner

I filtered the output of sysctl to what seems really related to re.

banner.txt
dmesg.txt
pciconf.txt
sysctl_re.txt

wallabybob

Thanks for the additional information.

When you run the "nc" test which pfsense interface is the server connected to? DMZ (re1)?

Which interfaces are reported in the "rex: watchdog timeout" messages? (The driver includes the comment Tx completion interrupt which seems to be lost on PCIe based controllers under certain situations. but the watchdog function also looks for received frames.)

Interface re0 runs a bunch of VLANs including (apparently) a VLAN for the PPPoE WAN interface. Correct?

Which interfaces are members of the bridge interface?

Are you using jumbo frames?

elgo

@wallabybob:

Thanks for the additional information.

Thank you taking time helping me :)

@wallabybob:

When you run the "nc" test which pfsense interface is the server connected to? DMZ (re1)?

Which interfaces are reported in the "rex: watchdog timeout" messages? (The driver includes the comment Tx completion interrupt which seems to be lost on PCIe based controllers under certain situations. but the watchdog function also looks for received frames.)

Yes, re1 is the DMZ interface with the server at the end, and it's the one reporting watchdog timeout in my test case. During normal operations, when the "WAN" happened to complain, it's re0 (which is the physical interface it is based on finally) that's reported.

@wallabybob:

Interface re0 runs a bunch of VLANs including (apparently) a VLAN for the PPPoE WAN interface. Correct?

Yes, "The Internet, by Orange". The pfense WAN interface is PPPoE over a VLAN over a physical link (re0). And it's supposed to be an optical fiber access commercial offer… innovative technologies and all (ppp).

@wallabybob:

Which interfaces are members of the bridge interface?

Erk, I'll detail this, it's not trivial, but I'm not sure it will be really useful, as the bridge is not involved in the last simplified test case.

BRIDGE_TV 
	|- ORANGE_VLAN_838 --> re0
	|- ORANGE_VLAN_839 --> re0
	|- ORANGE_VLAN_840 --> re0
	|- ORANGE_VLAN_841 --> re0
	|_ TRUNK_TV (vlan 1000) --> re4

Actually, there is barelly any traffic there (IGMP/multicast announces sometimes). No IP address configured on any of these brigded interfaces nor on the bridge itself.

@wallabybob:

Are you using jumbo frames?

Not on re0 or re1 (WAN/DMZ) which are the 2 EVIL NICs. re2 (LAN) and re3 (DMZ2, unused for now) have jumbo frames enabled (7k, due to their chip design).

To sum it up:

re0 - 8111EVL -> …. -> WAN
re1 - 8111EVL -> DMZ
re2 - 8110SC -> LAN
re3 - 8110SC -> DMZ2 (unused)
re4 - 8110SC -> … -> bridge

n1ko

I'm seeing 100% similar issues with my Realtek card. Loading the server from the client (aka running iperf -s on the pfsense machine and iperf -c on a desktop) results in link flapping. If I test it the other way around (iperf -s on a desktop, iperf -c on the pfsense) it works fine.

I'm running a single physical card (integrated) with two vlans. Connected to a HP 1810G v2 switch over a 1Gbps link. Cables tested, no jumbos (although i have tried with jumbos too). All offloading features etc have been turned on and off for testing, no help.

One difference though, i'm not seeing any watchdog timeouts anywhere. Only link flapping.

dmesg:
re0: <realtek 8111="" 8168="" b="" c="" cp="" d="" dp="" e="" f="" pcie="" gigabit="" ethernet="">port 0xe800-0xe8ff mem 0xfdffb000-0xfdffbfff,0xfdffc000-0xfdffffff irq 19 at device 0.0 on pci4
re0: Chip rev. 0x2c800000
re0: MAC rev. 0x00000000
pciconfig:
re0@pci0:4:0:0: class=0x020000 card=0x31251019 chip=0x816810ec rev=0x06 hdr=0x00
uname:
FreeBSD pfsense.koti 8.3-RELEASE-p6 FreeBSD 8.3-RELEASE-p6 #0: Tue Mar 19 06:22:08 EDT 2013 root@snapshots-8_3-amd64.builders.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64</realtek>

elgo

Yeah, it really seems as a driver issu (don't get me wrong, I know what OSS is :)) or "broken hardware by design" issue (which can be verified by trying another OS. I'll give a shot to linux based OpenWRT AA in a couple of weeks on this hardware, and test it thoroughly).

Additionnally I took a look at the differences between re driver in freebsd 8.3 (actual pfsense 2.1 beta) and 9.1, but I don't see any hope to be expected there (hardly any difference "features side").

So either it's a matter that is to be reported upstream (how? what data are to be provided?) or it's… hopeless. I don't even know the technical differences between 8111E and 8111E-VL chips (they seem different as they weren't supported by re driver at the same time according to driver's changelog).
Would it be useful to create a new dedicated topic to issues with realtek 8111E & other variants?

--
edit:
freebsd 8.3 based freeNAS users have the same issue (people that don't have simple cable issues, of course).

wallabybob

@elgo:

So either it's a matter that is to be reported upstream (how? what data are to be provided?)

FreeBSD problem reports can be lodged at http://www.freebsd.org/send-pr.html

Reports so far suggest the problem might be related to heavy receive traffic. PERHAPS the device experiences receive overflow and the driver doesn't really know how to recover.

I haven't found any way of setting "flow control" in FreeBSD. (Receiver sends XOFF when it wants transmitter to stop, sends XON when transmitter can resume.) Maybe it needs to be set on these devices because they have minimal buffering and the driver doesn't set it. Maybe it can't be set. Maybe it is set and the other end is ignoring it.

It could be interesting to see what numbers come out of the following test:
While iperf (or similar) test is running and before it terminates, give shell commandsvmstat -i ; sleep 10 ; vmstat -iand```
netstat -i ; sleep 10; netstat -i


@elgo:

> or it's… hopeless. I don't even know the technical differences between 8111E and 8111E-VL chips

It might be a simple as a different PCI ID. It might be a complete redesign.