[Fixed] Crash due to receiving jumbo frames with Applianceshop.eu Dual GHz



  • We are using Applianceshop.eu supplied pfSense installation and started to experience sudden total lockups/crashes without anything shown in the console or logs.

    Link to hardware:
    http://www.applianceshop.eu/index.php/opnsense-dual-ghz-rack-edition-pfsense-appliance.html#specifications

    The hardware would randomly lock in 1-3minutes of operation after hard boot (pulling the power plug).

    After a good while of investigation and limiting the traffic to the wall we ended up running tcpdump on the LAN-interface (which was re1 in our case).

    Dmesg info on re1:

    re1: <realtek 8111="" 8168="" b="" c="" cp="" d="" dp="" e="" pcie="" gigabit="" ethernet=""> port 0xce00-0xceff mem 0xcfbff000-0xcfbfffff,0xff8fc000-0xff8fffff irq 18 at device 0.0 on pci4
    re1: Using 1 MSI messages
    re1: Chip rev. 0x28000000
    re1: MAC rev. 0x00000000
    miibus1: <mii bus=""> on re1
    rgephy1: <rtl8169s 8110s="" 8211b="" media="" interface=""> PHY 1 on miibus1
    rgephy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
    re1: [FILTER]</rtl8169s></mii></realtek>
    

    A moment before the lockup tcpdump included the following lines:

    19:55:36.358159 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 8986
    19:55:36.358216 00:50:56:5c:3a:a8 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 9000: 
            0x0000:  026e 8922 026f 7564 230f 0380 4543 454e  .n.".oud#...ECEN
            0x0010:  004d 3c5e 11f3 0000 0000 0000 0000 0000  .M<^............
            0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0040:  6434 2064 3220 3039 2035 3020 6336 2065  d4.d2.09.50.c6.e
            0x0050:  3120                                     1.
    19:55:36.358694 00:50:56:5c:3d:78 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 9000: 
            0x0000:  026e 8922 026f 7564 230f 0380 4543 454e  .n.".oud#...ECEN
            0x0010:  004d 3c9c 1bf4 0000 0000 0000 0000 0000  .M<.............
            0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0040:  6434 2064 3220 3039 2035 3020 6336 2065  d4.d2.09.50.c6.e
            0x0050:  3120                                     1.
    19:55:36.358944 00:50:56:5c:3d:7a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 9000: 
            0x0000:  026e 8922 026f 7564 230f 0380 4543 454e  .n.".oud#...ECEN
            0x0010:  004d 3cbe 41f5 0000 0000 0000 0000 0000  .M<.A...........
            0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0040:  6434 2064 3220 3039 2035 3020 6336 2065  d4.d2.09.50.c6.e
            0x0050:  3120                                     1.
    

    The FreeBSD man page for RE(4) http://www.freebsd.org/cgi/man.cgi?query=re&sektion=4&manpath=FreeBSD+8.1-RELEASE says that "The RealTek 8169, 8169S and 8110S chips appear to only be capable of transmitting jumbo frames up to 7.5K in size."

    Well it would be somewhat acceptable that the endorsed hardware solution couldn't handle sending jumbo frames of 9000 length, but its really not acceptable at all that the hardware deadlocks due to receiving a 9k frame. It is a gigabit adapter after all. We haven't tested more to see if any 9k packet will crash the system or does it require specific conditions. Though my thoughts about solid firewall would be that it should not deadlock due to anything received from the network be it internal or external side of the wall…

    This is really leaving a bad taste for buying something I thought "was tested to be good" for pfSense.

    We did replicate the situation with 3 different appliances (of the same type) with very repeatable results.

    It seems a dirty fix that we need to disable jumbo frames from the switch so that even internal LAN-traffic can't use jumbo frames.

    Has anyone else had similar issues? Any ideas on a better fix than limiting the whole lan to mtu 1500?



  • I would honestly return the boxes if you still can.  You've hit a hardware limitation of the chipset (most likely because of binaries that were/were not supplied to the FreeBSD dev team).  There are many good discussions on this board about RealTek chips as well as other vendors' chips.  There may be a workaround, and possibly 2.1 might address this limitation, but I'm not entirely sure about that.

    Another reason to return the hardware is that it's very expensive and you can probably do a much better job building a box from individual parts.  I built a 2U pfSense box with 4 xGbit NICs,, and i3, and 4GB of RAM for US$400.  I got to choose my hardware and didn't have to worry too much about supportability.

    But I think you have a legitimate grievance with the hardware reseller because they should have disclosed the limitations of their hardware running pfSense.  As you found out, it's clearly documented by the FreeBSD folks.



  • This is a good example of why we're going to launch a certified hardware program soon. "Tested to work" means different things to different people depending on who's doing the testing and what they're testing exactly. Our testing regimen would have caught that, where we'd be able to put a note on the test results that it doesn't support jumbo frames, and maybe do something beyond that.

    I would get in touch with applianceshop and see what suggestions they have. You may have to return the hardware, it sounds like it's a hardware limitation that it's not going to do what you need it to do. But I wouldn't discourage anyone from buying from applianceshop.eu, they're one of the biggest supporters of the project, and in general their hardware is better tested than most anyone else's out there.

    The problem isn't anything they're doing, it's that no one out there does a truly comprehensive test at this point in time. We'll soon be doing that ourselves, in our office, with standardized testing procedures that include a wide range of things that have problems from time to time whether in the hardware itself or in drivers. We hope to get a high adoption rate from our resellers, since it'll avoid circumstances such as this that happen from time to time.

    But note - you're far, far more likely to have problems with random hardware you bought from somewhere else, or building your own box from parts, than you are with one of our recommended hardware vendors. Even if all the pieces are supported, they may not work well in combination, amongst other potential complications.



  • I do know very well the grievances of testing hardware that you assembled yourself. I really wanted to avoid doing that when I authorized the purchase of these units. Overall it seems that trying to use pretested or brand equipment just causes you to be debugging even odder problems.

    I don’t think Applianceshop is any worse than the big players out there (Dell, HP, Cisco etc). All of them are selling products and listing “what we can’t do” isn’t a big seller. In fact previous issue with the setup we are doing was due to known Broadcom teaming bug in Dell hardware.

    Then again it would really be nice if they would mention the network adapters used in their hardware. Now there’s no mention about it in the sales pages (except for models where they use Intel cards).  In networking appliance the manufacturer/model can be a big decision maker. I went in to the classic “assume they put in good hardware” –thinking and didn’t do my groundwork before ordering. Even if I knew there was  a Realtek card in there, I wouldn’t have guessed you can crash the hardware by sending jumbo frames, I would just have assumed the card would silently drop them.

    We’ll be in contact with Applianceshop.eu too. But I did want to give a heads up so that someone searching the forums for info might have bit more preparation info when selecting hardware. In conclusion: the hardware is solid, as long as there are 0 chances of full jumbo frames arriving at any port.



  • REALTEK JUMBO FRAME - ISSUE SOLVED

    Dear TNX, and anyone seeking a solution for the Realtek jumbo frame issue.

    Although we have not been contacted by TNX or anyone else about this issue we did notice it on the forum just a week ago.
    Since then we have been investigating if the issue is hardware or software related.

    As Realtek states the RTL8111D(L) does support jumbo frames up to 9K, we quickly concluded that the issue must be related to the drivers.
    To verify if the issue does indeed appears on the first jumbo-frame we setup a test system with a plain pfSense and a mac book pro..
    And indeed… on the first jumbo frame (size did not matter, everything larger than 1500) the appliance crashed.

    So now we had a reproduceable situation and we started to search for  a solution..
    First we tried a later version of FreeBSD (9.1), this version did allow us to set jumbo frames and did not crash on receiving them.

    To shorten the rest of the story.. we have recompiled the kernel to include the new Realtek driver (from their website) and that solved the issue completely.

    To install:

    1. Dowload from http://www.deciso.com/downloads/opnsense-ghz-realtek-kernel.tgz
    2. unpack the file and copy to /boot/kernel/  (filename should be kernel)

    Best regards,

    Jos Schellevis
    Deciso B.V.



  • @jschellevis:

    REALTEK JUMBO FRAME - ISSUE SOLVED

    Dear TNX, and anyone seeking a solution for the Realtek jumbo frame issue.

    Although we have not been contacted by TNX or anyone else about this issue we did notice it on the forum just a week ago.
    Since then we have been investigating if the issue is hardware or software related.

    As Realtek states the RTL8111D(L) does support jumbo frames up to 9K, we quickly concluded that the issue must be related to the drivers.
    To verify if the issue does indeed appears on the first jumbo-frame we setup a test system with a plain pfSense and a mac book pro..
    And indeed… on the first jumbo frame (size did not matter, everything larger than 1500) the appliance crashed.

    So now we had a reproduceable situation and we started to search for  a solution..
    First we tried a later version of FreeBSD (9.1), this version did allow us to set jumbo frames and did not crash on receiving them.

    To shorten the rest of the story.. we have recompiled the kernel to include the new Realtek driver (from their website) and that solved the issue completely.

    To install:

    1. Dowload from http://www.deciso.com/downloads/opnsense-ghz-realtek-kernel.tgz
    2. unpack the file and copy to /boot/kernel/  (filename should be kernel)

    Best regards,

    Jos Schellevis
    Deciso B.V.

    I am having a similar issue with my pfsense version 2.0.3-RELEASE FreeBSD 8.1-RELEASE-p13, Platform nanobsd (4g) on a OPNsense 5 port Ghz rack edition - 19" pfSense appliance

    After downloading the file, am unable to extract it, do you know why?

    wget http://www.deciso.com/downloads/opnsense-ghz-realtek-kernel.tgz

    tar -zxvf opnsense-ghz-realtek-kernel.tgz

    tar: Unrecognized archive format
    tar: Error exit delayed from previous errors.

    Thank you. 8)



  • €1,099 for that Atom junk…. ouch !!

    you can make 3 (maybe 4 if you know where to shop) systems of that config for €1,099


  • Netgate Administrator

    I'm seeing the archive as corrupt also.  :(

    Steve


  • Rebel Alliance Developer Netgate

    It should just be kernel.gz, not .tgz. The file as-is is simple gzip compressed, no tar, which is why tar doesn't like it. So either they have uploaded the wrong archive, or you should just move that to /boot/kernel/kernel.gz (don't decompress it)

    You may want to make sure you have a good backup first, and/or:

    cp -Rp /kernel /kernel.old
    

    That way you have a copy of the other kernel on hand to boot if that one fails.

    
    # file opnsense-ghz-realtek-kernel.tgz 
    opnsense-ghz-realtek-kernel.tgz: gzip compressed data, was "kernel", from Unix, last modified: Wed Jun 12 10:56:20 2013
    

  • Netgate Administrator

    Doh! Should have tried that.  ::)

    If this is just an alternative re(4) driver can it not be loaded as a kernel module instead? That would surely be easier.

    Steve



  • Did this in shell after uploading kernel.gz to /tmp using upload function in Diagnostics -> Command Promt

    mount –rw /
    cp -Rp /boot/kernel /boot/kernel.old
    rm -rf /boot/kernel/*
    mv /tmp/kernel.gz /boot/kernel/kernel.gz
    chmod +x /boot/kernel/kernel.gz
    mount –w /
    shutdown -h now
    

    After boot pfsense comes online just fine. After login in shell there is still just the kernel.gz in /boot/kernel/ and it has not been unpacked. But it still boots just fine?

    Edit: Also tried the above without deleting files in /boot/kernel/ but just renaming old kernel.gz and copying new kernel.gz into the /boot/kernel/ folder. After boot files sill have same data in /boot/kernel/ so I guess the kernel.gz did not unpack this way either



  • Wrong to delete all files in boot/kernel.

    But does not matter. Problem should be solved in the new 2.1 release. Upgrade or reinstall to fix issue. Probably a better solution.


Log in to reply