Dell R200 crashes on IBM Intel/Pro 1000 Quad NIC



  • G'day you all lovers of the finest firewall in the world  ;D

    I am the happy owner of a second pfSense machine, a Dell R200 (see sig), on which I some days ago managed to install pfSense. As I reported here: https://forum.pfsense.org/index.php?topic=74316.0

    I proudly reported that it worked, and it does, but: not anymore when i add a quad port IBM/Intel/Pro 1000PT NIC (P/N 39Y6138).

    I need this NIC as the Dell has only two NIC's, and this second machine is a fall back for my first pfSense which has dual WAN (which is another fall back; it seems I always want to have fall backs  ;D).

    Now, before I bought it, I of course first googled. I found this thread:

    https://forum.pfsense.org/index.php?topic=68535.0

    But as it was another motherboard I decided to try it anyway. And the problem in that thread indeed did not occur: the quad NIC was recognized by the Dell, and I could configure a WAN2 that worked - for awhile.

    However:
    1. On customizing WAN2 (Cable, DHCP, MTU 1500) I wanted to enable Snort. The system crashed hard (sponteanous reboot whilst in the middle of customizing).
    2. I decided to skip customizing Snort for awhile and left it running without, and used it to browse the internet. After say half an hour WAN2, cable, didn't connect anymore. Gateway down. The system however had not crashed, nor was there a message on the console. I went to bed ( ;D).
    3. This morning, I booted the Dell again as I wanted to take so screen shots of the bios to post here. When I came back to look in the bios the system had crashed. And that did gave a clue. 'Something with IRQ'.

    I have no clue how to solve this problem, but I did notice that these 'IRQ's' are sharing the same address/channel/how-it-is-called with onboard equipment.

    Would anybody happen to know how to solve this problem?

    As always, I'd be in big debt for your help  :P

    Thank you very much in advance,

    Bye,

    PS I am uploading the screenshots currently, so might take 5 minutes before I have attached them here.

    EDIT: I disconnected the card. On it is says both P/N 39Y6138 and P/N 39Y6137. It 'seems' this has the 82571 chipset (I can not really find an Intel spec sheet), and that chipset should be supported by the em(4) driver, as written here: https://www.freebsd.org/releases/8.3R/hardware.html




  • (I really wish the limit on attachment size was decreased  :-X).

    Pic:




  • Pic:




  • Pic:




  • Pic:




  • Pic:




  • Pic:




  • I think you will have the same issue as my pfSense at first after installation.

    I have it running on a DELL R210 II with an Intel Quad port NIC card. However mine is an I210.  ;)

    In my case the NICs blown up the MBUFs till the system crashed. But at this time the system hasn't to manage any traffic!
    Take a look at the MBUF at the dashboard.

    For me I could resolve it by adding

    kern.ipc.nmbclusters="131072"
    hw.igb.num_queues=1
    

    to /boot/loader.conf.local

    The first line pumps up the NICs memory buffer which take a litle bit more of system memory. However, your system have enough of it. The second shuts off multiple queuing on the Intel card.
    "igb" here is the name of my NIC driver. As far as I know your card uses the em driver, so replace it with "em", please.


  • Netgate Administrator

    That doesn't explain why it threw an error while in the bios.
    I would certainly disabling on-board devices to free up IRQs. Maybe you can manually assign separate IRQs to each NIC. It seems likely that this combination of hardware will have been used before and someone else will have found a solution especially if it's OS independent.
    Are you running the latest bios?
    When it does boot into pfSense are the NICs on the card using MSI or MSIX?

    Steve



  • Hate to say this, but  as you said, there's an entire thread on it. ( https://forum.pfsense.org/index.php?topic=68535.0 )

    I am not certain exactly why this is an issue, but my guess is that it has something to do with the following:
    1. Incompatibility of the bridge chip on the card - it must run at pci-e gen 1 speed.
    2. The issues mentioned here: http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5079623 and here: http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5076250 (the second has a potential solution for you)

    A clue (from the first link):
    "On the IBM product label on the back of the adapter, you can view the 11S Y header code content or scan with the scannable bar code:

    If the reading shows "YK50KK," this is the new adapter hardware level that is PCI Express Gen 2 compatible and should work fine with PCI Express Gen 2 slots. This adapter hardware level supports both PCI Express Gen 1 and Gen 2.

    If the reading shows "YK50EX," this is the old adapter hardware level that is not supported for PCI Express Gen 2 slots. This adapter hardware level supports PCI Express Gen 1 only.

    The older 39Y6136 adapter hardware level is not compatible with the PCI Express Gen 2 slots on servers, and therefore it must be replaced with the newer revision."

    My solution was to return it and get an i350-AM4 (chinese) card from ebay to replace it for about $100 which has since worked flawlessly.



  • Thank you to all of you for replying  :-*

    I took some time to experiment with a zillion options, and I have good news: it works.

    So, 'for future generations' ( ;D) this is what I did:

    • I gave up on the IBM nic. Luckily for me, I ordered two different nic's (I have two pfSense boxes, one main and one backup and need a quad nic in each of them). The other one I had was a Dell YT674 (which is also an Intel quad nic).
    • I installed this Dell nic in the Dell R200 and at least it didn't crash pfs.
    • Unfortunately, it had its other problems, which is documented here: https://forum.pfsense.org/index.php?topic=74942.msg410209#msg410209

    To summarize, how I got this Dell nic to work in the Dell R200:

    • Upgrade AMD 2.1 to 2.1.1 and then 'just to be sure' to 2.1.2.
    • Reduce the cores in the bios from 4 to 2.
    • add this to /boot/loader.local.conf:
    
    #for intel nic
    kern.ipc.nmbclusters="131072"
    #hw.igb.num_queues=1
    hw.igb.rxd=4096
    hw.igb.txd=4096
    

    I monitored this for the last two days and everything is stable per the attached screenshot.

    Scientifically I should try to determine which (combination) of these three steps actually solved it, but this took me so much time I even told WIFE to postpone celebrating my birthday as I was in no mood to celebrate anything (due to the hours and hours of googling and reading and trying), so I will postpone the scientific lab results  ;D

    But all together these three things, and the occassional reboots of course, appear to have solved it.

    Once again: thank you very much for your kind help(!)

    Bye  ;D




  • @Hollander:

    
    hw.igb.rxd=4096
    hw.igb.txd=4096
    

    Adding up all file descriptors from all interfaces can not exceed a total of 4096. Try this and see how things run.

    
    hw.igb.txd="2048"                 # number of transmit descriptors allocated by the driver. 2048 limit (default 1024)
    hw.igb.rxd="2048"                 # number of receive descriptors allocated by the driver, 2048 limit (default 1024)
    


  • @foonus:

    @Hollander:

    
    hw.igb.rxd=4096
    hw.igb.txd=4096
    

    Adding up all file descriptors from all interfaces can not exceed a total of 4096. Try this and see how things run.

    
    hw.igb.txd="2048"                 # number of transmit descriptors allocated by the driver. 2048 limit (default 1024)
    hw.igb.rxd="2048"                 # number of receive descriptors allocated by the driver, 2048 limit (default 1024)
    

    Thank you very much for your reply  ;D

    Well, this is weird  :o

    The wiki says '4096', as do many threads on the FreeBSD forum as well. However, I was intrigued by:

    number of receive descriptors allocated by the driver, 2048 limit (default 1024)

    Could I ask where you have this text from? Because it would appear all other information is wrong then (?)

    Especially given this:

    LOADER TUNABLES
        Tunables can be set at the loader ( 8 ) prompt before booting the kernel or
        stored in loader.conf( 5 ).

    hw.igb.rxd
        Number of receive descriptors allocated by the driver.  The
        default value is 256.  The minimum is 80, and the maximum is
        4096.

    hw.igb.txd
        Number of transmit descriptors allocated by the driver.  The
        default value is 256.  The minimum is 80, and the maximum is
        4096.

    From:

    http://www.freebsd.org/cgi/man.cgi?query=igb&sektion=4&manpath=FreeBSD+8.1-RELEASE