Firebox Marvel ports locking up (CORE-E SERIES)



  • I have a firebox, it has become very unstable to say the least.  It is running 2.2.4 Nano 4G.

    What is happening, every now and then network traffic just stops.  I have 2 WAN ports configured and in use, I also have 2 LAN ports configured and in use.  I can't tell with the WAN ports, but both LAN ports stop traffic.  I have no idea what is affecting this or why it might be doing it, I can't test anything because I can't access it via web gui or via telnet on either of my LAN IPs.

    I have tried serial and can access it that way, but with limited understanding of Linux I can't figure out a damn thing. lol.

    So, if there is any logs I can push to help, please let me know.  The only addons I run are LCDproc and Snort.

    Any ideas on how to proceed with this?



  • It turns out, Snort was doing it.  I uninstalled it last night and it has been fine since.  Wonder what is up with that!


  • Banned

    There are actually logs and alerts visible in Snort package. Perhaps use them and disable rules that are blocking yourself? Or, disable the blocking feature altogether until you tune the thing? Snort is not an install-and-forget package.



  • Snort seems to contribute to the problem, but i since removed snort and it has ran good since my last comment.  Today it did it again, both my lan ports will not allow access to the gui, even though the system says they are up and i see my wan lights flashing so i know they are active, did not think to look at the lan leds.

    It has me stumped, i did read somewhere these fireboxes have issues with the lan ports but this has not really done this in the past.

    Some history as to what went on recently.  One of my wan ports quit working, i found it had a weird name or type on the network connection and i could not change it. I ultimately ended up locking myself out of the box, so i had to go in via serial and reset the network ports.

    Once i managed to get in, i reloaded the config backup to restore my settings.  This worked but for two days it was saying it was installing and configuring snort!. I ended that process and rebooted and all was gunky dory until i noticed snort was not complete.  I removed snort and reinstalled snort, after this all was back to normal.

    Will i thought it was, then i started getting the constant port locking up on me, removing snort seemed to fix that issue, or so i thought.

    Apart from a reinstall from scratch, i am stuck.  I don't want to reload as is a serious pain in the ass with these boxes.



  • I am having this same issue where the ports on my Firebox X550e keep locking up.  When they lock up I am getting the message kernel: arpresolve: can't allocate llinfo in the logs.  This is not just on one interface.  Last night both my lan and my wan interface locked up.  If I unplug the ethernet cable and plug it back in things will return to normal.



  • @thadrumr:

    I am having this same issue where the ports on my Firebox X550e keep locking up.  When they lock up I am getting the message kernel: arpresolve: can't allocate llinfo in the logs.  This is not just on one interface.  Last night both my lan and my wan interface locked up.  If I unplug the ethernet cable and plug it back in things will return to normal.

    I am glad it's not just me, not glad it's happening to you… you know what I mean.

    I have never tried unplugging the cables, I guess unplugging them changes the state that PFSense sees them and must reset upon connection.  I shall try this next time it happens, I could not tell you if my WANs lock up, I normally just shut the unit down and boot it back up again.

    Has to be buck with the driver for the network ports, I can't think of anything else it could be, right now I am off to research this as I know I have read about it somewhere.



  • And here it is…

    Known Issues
    The Realtek NICs in this box are known to suffer a lock-up condition under certain circumstances. Despite repeated efforts it has not been possible to either cure the problem or ascertain exactly what triggers it. When the problem is triggered the system log will show watchdog timeout and refer to the interface causing it. Fortunately this doesn't affect all users and even then only under some circumstances.
    It would seem to be related to packet fragmentation and hardware off loading. Some users have reportedly solved the problem by disabling all hardware offloading and/or using a better switch that can reassemble packets correctly.

    As found here….
    https://doc.pfsense.org/index.php/PfSense_on_Watchguard_Firebox

    Forgive me for saying this, but this is kind of a dumb statement to make

    or using a better switch that can reassemble packets correctly.

    seeing as the ports are built into the Firebox, how does one use a better switch?  Defeats the purpose of the Firebox does it not?



  • That is for the firebox core series not the core e series which have Marvel based nics



  • @thadrumr:

    That is for the firebox core series not the core e series which have Marvel based nics

    You are correct, my mistake but very similar to our problem.



  • How often are you seeing this? Do you mainly see it with heavy traffic? Just trying to compare and to see if there is something you're running that might be contributing to the issue.  Are you running any add-ons?



  • I was only running LADVD, LCDPRocDEV and was still having the problem. I could only get around a hour or so out of the box before it would lock up. I thought it was Snot that was contributing to the issue but I had it disabled and still had the problem.  The only thing I can think of is the config I restored came from a completely different box so I have now wiped my Card and did a re-install with an upgrade to the latest bios 8.1 we will see if that helps.  I am currently running a base config that I did by hand with only LCDProc Dev installed.  I will see if it lasts the night and let you know.  I also had IPV6 with Prefix Delegation running but I have that turned of on my new install now.  I will post back in the morning on how things are going.

    Update everything was still running this morning Thursday after I woke up so things are going ok so far.  I am going to let it run the rest of the day while I am at work.  If it is still stable then I will start adding things back on one at a time starting with my IPV6 and so on.



  • I am still getting this lockup issue, same as yours, if I remove the lan cable and plug it back in, the port must reset and it works again.  So it seeing the state as up to down and back to up, must reset something in a sense.

    This is getting kinda old, I am considering pitching this Firebox as the hardware is not very reliable in this new build.  When I say hardware, it has to be a driver issue for the Marvel Network ports.



  • @deanot:

    I am still getting this lockup issue, same as yours, if I remove the lan cable and plug it back in, the port must reset and it works again.  So it seeing the state as up to down and back to up, must reset something in a sense.

    This is getting kinda old, I am considering pitching this Firebox as the hardware is not very reliable in this new build.  When I say hardware, it has to be a driver issue for the Marvel Network ports.

    Yep. Glad I'm not the only one seeing this issue.  My x750e does the same thing with a fresh 2.2.4. install.  The disconnects in 2.2.4 with Firebox are unacceptable.  v2.1.5 is solid and that's what I reverted back to from my backup.  I'm ordering pieces and parts to build a new faster system for PFsense and then am going to test it out throughly before actually making the switch to new hardware in the network.



  • Wish I could find a way to roll back, I upgraded from the GUI, so it has been overwritten.  I find, the more time I spend in the GUI, the more often it will lock the port up.  I also find, using IE in the GUI is less harsh than running Chrome to access it.



  • Last night it was lock up after lock up, something I have done to the box is to drop the LAN port speeds to 100base Full Duplex.  I do not use the ports for subnet routing, both ports are on the same subnet, one is for access in case of a lock out (got to love headless boxes).

    Since I have done this, the throughput seems better, the box seems to be more responsive and it has not locked out…. YET.

    I shall update as to how this box is now working out.



  • one is for access in case of a lock out (got to love headless boxes).

    At the IPMI port if one is there it would be ok, but if not using the IPMI port for this action
    you could be creating a network loop! And then this could be the problem for locking out!!!



  • Slowing my port speed down seems to have worked.



  • I'm not using a Firebox but I'm having similar lockups….can't access anything on LAN (and Internet stops) but console still working fine (monitored through IPMI port).  It was suggested in other threads around the web to not set "Autodetect" for the port speed.  Have you by chance set it to 1000T full duplex instead of Autodetect to see if that helps? (if you have done so, my apologies...just curious).

    I'm going to follow this thread as it seems very much like what's happening to my SuperMicro N3700 setup.  If my system locks up again, I'm going to do the same (unplug the LAN and replug to see if it comes back to life.  I really hate to lower my port speed down - even though it wouldn't effect my network as my ISP is well below 100Meg).

    Good luck.



  • Yeah, I set my ports at 100 base and full duplex.  I think the speed of the ports was causing the issue.  It has been up solid for over 24 hours now.



  • @deanot:

    Yeah, I set my ports at 100 base and full duplex.  I think the speed of the ports was causing the issue.  It has been up solid for over 24 hours now.

    Did you have it set to 1000T full duplex before that or "Auto Detect"?



  • Sorry, it was on auto detect from install. Pfsense even says it should be set to auto detect.



  • @deanot:

    Sorry, it was on auto detect from install. Pfsense even says it should be set to auto detect.

    I had read from a few places that setting to auto detect would cause issues similar to yours (and mine) and that setting it to a fixed speed would fix it.  I set mine to 1000T Full Duplex yesterday morning in hopes it would correct the issue.  Now just wait and see.



  • Yeah, mine seems stable, I don't really need 1000Base on mine, it handles internet traffic only.  Seeing as the ports on the modem are 100Mbps max, I see no reason to crank it higher at this time.



  • I have mine set at auto negotiate and it seems to be stable though I still get the random lockup but is now going days between lockups.  The other day I tried to install Snort back on my box and it locked up within an hour of the install.  I now have snot off for the last 4-5 days and it only locked up on Saturday once and that was with snort removed from the box.  One thing I did do was to increase the size of the /tmp file system which resides in ram.  I have pleanty of ram to spare as I am running 2gb of ram in my box.  I bumped it up to 128MB just as a test and knock on wood has been running since Saturday evening eastern time.



  • My box has been up and running without issue now for 2 days and 21 hours, Snort has been back on it over 2 days now.  Still no issues… Fingers crossed.



  • Mine locked up around 30 minutes ago.  I went to the console and it was working.  Before I could go downstairs to unplug and replug the LAN, it came up with this message (added attachment).  After that, Internet would partially work.  Had to go in and disable IPV6 to get it all working.  Note, the first time it locked up (last week), IPV6 was disabled so I'm not sure it had anything to do with IPV6.  The watchdog seems to be the key here.  Any thoughts?




  • I am going to quote this again, i know we are not talking about realtec cards, but the symptoms are almost identical.

    Known Issues
    The Realtek NICs in this box are known to suffer a lock-up condition under certain circumstances. Despite repeated efforts it has not been possible to either cure the problem or ascertain exactly what triggers it. When the problem is triggered the system log will show watchdog timeout and refer to the interface causing it. Fortunately this doesn't affect all users and even then only under some circumstances.
    It would seem to be related to packet fragmentation and hardware off loading. Some users have reportedly solved the problem by disabling all hardware offloading and/or using a better switch that can reassemble packets correctly.



  • My box is still doing it, had to reset twice yesterday.  Might slow the ports down to 10Mbps and see what happens.



  • @deanot:

    My box is still doing it, had to reset twice yesterday.  Might slow the ports down to 10Mbps and see what happens.

    Sorry if you've tried this but if not, could you try the MSI/MSIX tweak(s) posted here and see if it helps:

    https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

    Seems to help some…others, not so much.

    Also seems that this is something released in the 2.2.x series as people go back to 2.1.x and problem goes away for the most part.



  • Thanks, I will try that.  The weird thing is, the network was fine, as soon as I tried to login to the server, the port crashed/locked up/died.  It does this not all the time, but seems to be triggered by it.  Very odd if you ask me…



  • Reading some more, I found this thread (https://forum.pfsense.org/index.php?topic=96325.0) which is Intel related but I'm not so sure it doesn't go beyond that.  Seem FreeBSD 10.1 has an issue that needs to be corrected at the OS level that 'could' be causing this.

    https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174

    States "In Progress" under status.

    Not sure if this effects Realtek cards or not but reading though the freebsd.org link,  the symptoms sound exactly like mine and yours too.



  • Seems this issue (at least with Intel NIC's) has been around for some time:  https://forums.freebsd.org/threads/workaround-freebsd-10-1-sudden-network-down.49264/

    No clue if FreeBSD 10.2 fixes it or not.  However, pfsense 2.1.5 has an older FreeBSD and I'll try that if it's not fixed soon.



  • What ever it is, as said before, was exacerbated with the 2.2.x upgrade in my opinion.  I don't recall seeing it in past releases, maybe some have seen it, but personally I have not been using the great software near as long as some have.

    I hope there is a fix for it soon, I hate to move to something else, but it is becoming too unreliable for myself to continue use.



  • @deanot:

    What ever it is, as said before, was exacerbated with the 2.2.x upgrade in my opinion.  I don't recall seeing it in past releases, maybe some have seen it, but personally I have not been using the great software near as long as some have.

    I hope there is a fix for it soon, I hate to move to something else, but it is becoming too unreliable for myself to continue use.

    2.2.x was the start of FreeBSD 10.1 with pfsense…as far as I can tell.



  • Interesting…  I am hoping that one of the devs may chime in, maybe have some insight into this.



  • Had first LAN lockup with watchdog timeout in three days.  Just turned off all hardware and checksum offloading, saved and rebooted.  Now to start over monitoring.

    If this doesn't work, I will probably go to 2.1.5 until (if that works) until it's resolved.

    Edit:  After reading around more, I'm more inclined to believe that the error is in FreeBSD and that a commit to fix the same error for the older Intel cards (em) has already been committed. https://reviews.freebsd.org/D3192

    As to when it appears, not sure.  Seems to have been sent to the Intel Networking for review too so time will tell.

    As for the OP's issue…if this is a FreeBSD issue (seems to appear on lots of different brands of hardware), hopefully, the FreeBSD patches will fix it.  Don't know enough about that to guess at this point though.



  • 2.1.5 works.



  • Not sure if it matters or not but I just noticed that even though I checked to disable TSO under Advanced, Networking, the system tunable net.inet.tcp.tso was at 1.  Changed it to 0, saved and then added net.inet.tcp.tso=0 to /boot/loader.conf.local just to make sure.

    I guess I'll reboot now instead of waiting a few days because I had the disable TSO option checked before and it made no difference.



  • Keep us informed on how it works out, I have been up for a couple of days without issue so far.  I did make the changes you mentioned earlier in the topic, it could be more stable from the changes, but this thing is temperamental and will flake out when it feels like it.



  • I changed the topic title to reflect more detail on the product.


Log in to reply