Weird behaviour with an Intel i350-T4 Controller pfSense 2.2



  • Hi Everyone

    Using the igb driver supplied with pfsense and performed the recommended tweaks as per documentation.

    I've recently bought a brand new OEM Intel i350-T4 quad port nic for my pfsense 2.2 box but from the start had weird issues with it. First off, the kernel threw a panic complaining the NVRAM checksum was invalid. After toying around with intels bootutil I however managed to get it working both under Linux and pfsense.

    Now, the igb1 interface used for the VLAN part (having 10 vlans) drops the auto-negotiation to 100mbit after a period of time even though the the main switch, a HP 1810-24Gv2 are fully gigabit compliant. It happens to the other igb interfaces too.

    I've ensured that the switch firmware is updated, tried different ports in the switch, swapped out the factory made ethernet cables with brand new both cat 5e and cat 6 without help.
    When forcing the port in the switch to run at 1000mbit full duplex the connection becomes really unstable.

    Anyone with ideas in the right direction are very welcome.

    Thanks in advance.

    Best regards
    Glenn :)


  • Netgate Administrator

    Hmm, that sounds unpleasant. :(

    If you force the switch to 1000Mbps you must also force the pfSense end to the same speed and duplex. If one end is auto-negotiating and the other end doesn't respond to that odd things happen.

    Anything in the logs? Before the connection changes is it dropping packets or showing collisions etc? You might look at the sysctls for errors here.

    Are all four NICs connected to the same switch? Have you tried a different type of switch?

    Steve



  • @stephenw10:

    Hmm, that sounds unpleasant. :(

    If you force the switch to 1000Mbps you must also force the pfSense end to the same speed and duplex. If one end is auto-negotiating and the other end doesn't respond to that odd things happen.

    Anything in the logs? Before the connection changes is it dropping packets or showing collisions etc? You might look at the sysctls for errors here.

    Are all four NICs connected to the same switch? Have you tried a different type of switch?

    Steve

    Indeed it is, quite annoying actually when you want to fully utilize the network when moving virtual machines and routing traffic :)

    Yes, I know, and I did actually force the speed in both ends.

    Currently it's only igb0 and igb1 which is in use, igb0 for wan and igb1 for vlans. I've tried connecting a cisco SG102 (unmanaged) just to see if it keeps the auto-negotiated gigabit speed and it drops it to 100 mbit after around 15 minutes roughly.

    The weird part is that the igb0 interface which is connected my VDSL2 modem is properly sustaining a gigabit connection (it's a sagem modem-router fast 3464 something crap from my telco where the connection is bridged to one of the switch ports in the modem).

    Only thing i can find in the system.log is that some of the vlans which does not have traffic on it yet, are going up and down by the kernel.

    ./Glenn



  • Power management on the PCI express port is disabled, however could a buggy bios still cause the power management to screw around with the card or could the issue stem down from using MSIX interrupts?


  • Netgate Administrator

    Hard to say. The fact that it initially had a bad nvram doesn't inspire confidence.
    Check the sysctl error counters for anything that looks odd compared to the working interface. I forget exactly where they are off-hand, I usually just grep for them on the few occasions I had to look, I think you'll see them with:

    sysctl dev.igb
    

    You can narrow that down.

    Steve



  • @stephenw10:

    Hard to say. The fact that it initially had a bad nvram doesn't inspire confidence.
    Check the sysctl error counters for anything that looks odd compared to the working interface. I forget exactly where they are off-hand, I usually just grep for them on the few occasions I had to look, I think you'll see them with:

    sysctl dev.igb
    

    You can narrow that down.

    Steve

    Grepping through sysctl didn't reveal any transmission errors or strange behavior.

    Disabling flow-control is a complete no-go and forcing the kernel to use MSI interrupts made the controller unstable.
    I did however raise the num_queues from 1 to 2.
    Hardware TSO and LRO is disabled through the checkboxes in the WebGUI.

    However, until now as of writing, the connection between the pfsense box and the switch has been kept at 1000base-T full-duplex auto-negotiated for 6 hours now.

    I'll keep investigating the issue..

    ./Glenn


  • Netgate Administrator

    @bitsmurf:

    Disabling flow-control is a complete no-go and forcing the kernel to use MSI interrupts made the controller unstable.

    You mean as opposed to MSI-X or regular interrupts? I would expect it to operate without MSI-X OK.

    Is this a real i350-T4 or one of those coming directly from China at huge discount?

    Spontaneously re-negotiating a new connection speed without being disconnected is something happening at a low level.

    Steve



  • @stephenw10:

    @bitsmurf:

    Disabling flow-control is a complete no-go and forcing the kernel to use MSI interrupts made the controller unstable.

    You mean as opposed to MSI-X or regular interrupts? I would expect it to operate without MSI-X OK.

    Is this a real i350-T4 or one of those coming directly from China at huge discount?

    Spontaneously re-negotiating a new connection speed without being disconnected is something happening at a low level.

    Steve

    Yes, I reverted back to MSI-X interrupts. Must admit I haven't tried regular interrupts yet.

    Before buying the controller I did read about the Chinese knockoffs here at the forum, so I ended up buying the card from Germany through ebay. (this listing actually: http://www.ebay.com/itm/111442450246?_trksid=p2060778.m2749.l2649&ssPageName=STRK%3AMEBIDX%3AIT) as I shaved off 50% of the price compared to those in Denmark.

    Indeed, I know the negotiation is happening in the NICs firmware at the physical layer :)


  • Netgate Administrator

    Hmm, I don't see 'Intel' on the PCB at all in that Ebay listing. Is it on the board you have? Also the Ethernet PHY ICs look wrong. Not looking good to be honest.  :-\

    Steve



  • @stephenw10:

    Hmm, I don't see 'Intel' on the PCB at all in that Ebay listing. Is it on the board you have? Also the Ethernet PHY ICs look wrong. Not looking good to be honest.  :-\

    Steve

    Nope, no Intel logo, only the sticker on the back as shown in the listing.

    I'll guess it's a Chinese knockoff then? :/

    ./Glenn


  • Netgate Administrator

    That would be my guess. Not that that's necessarily a problem, as you've probably read there are quite a few positive reviews here on the forum. However in light of the link issues I think I would be returning that ASAP.

    Steve


  • Banned

    @bitsmurf:

    I'll guess it's a Chinese knockoff then? :/

    Kinda suggested there:

    Herstellungsland und -region (Country of origin): China


  • Netgate Administrator

    Aren't the real cards also built in China?



  • Steve, guess you're right, it's better RMA the product.
    Looking through other listings I've noticed that proper NICs has the intel logo silkscreened on the pcb near the ethernet connectors.

    Doktornotor: I did notice that before ordering. The genuine cards are build in China too.

    ./Glenn



  • A little bit to late, but it solved the problem well out for me and other here in the pfsense forum
    If you get your new  Intel i350-T4, and this will be an original, please do a firmware update to the
    latest version, and all will be running fine. Perhaps also for others how find this thread over Google
    and owns the same problem.


Log in to reply