Futro s920 + DELL INTEL PRO X3959 crash on heavy load
-
Hi All!
I saw a video about custom making your own router and loved the idea of making myself a custom router with pfsense so I bought a fujistsu futro s920 and a dell intel pro x3959 nic. Installation was perfect and the system was running quite smooth.
But when under heavy load, the system reboots without any reason.
I have bought another NIC of the same model and tested it but the results are the same. I really do not do what to do, I have tried some possible solutions given in the hardware tune section such as:
- kern.ipc.nmbclusters="1000000"
- hw.pci.enable_msix="0"
- hw.pci.enable_msi="0"
- dev.em.0.fc="0"
- dev.em.1.fc="0"
- hw.igb.num_queues=1
but none of those seemed to work.
I have also tried to reinstall twice, and the problem persists. Also, when stress-testing per network interface, the inboard realtek ethernet interface works perfectly with iperf. It's the intel interfaces (em0, em1) that are failing.Would someone mind giving me a hand with this? I am out of options.
-
Do you see a crash report after it reboots? Can we see it?
If it reboots instantly without panicking or generating a crash report it might well be a hardware issue. That NIC should be well supported at this point.
Those tuning options are unlikely to help. The igb queues tunable isn't doing anything for em.
Steve
-
@stephenw10 Hi Steve, thanks for your answer.
The crash does not generate a crash report nor anything interesting can be seen in system logs. It just plainly reboots.I know it sounds like a hardware issue but I tried with another NIC and it crashed the same. And as I wrote, if I test this over the native realtek interface of the motherboard, it does work fine.
Anything I can do to further debug this?
Thanks!
Manuel -
You can try enabling a serial console and logging the output.
The hardware might have some logging if it's a server device.
An expansion card creates heat and draws more power. Either of those things could be causing it to reboot if some component is on the limit.
If you can make it crash on demand do you see anything on the console when that happens?
Steve
-
@stephenw10
Hi, can you guide me on how to enable serial output? I am a bit lost with that.Thanks,
Manuel -
@manu_hna @stephenw10
Hi,
Just to make sure I added some thermal paste to the cpu and temps did go down but the crash is still ocurring, so I don't think the problem is because of high temperatures.I am runing an iperf server on the machine and sending traffic from a local computer of the net (so it hits lan interface, em0).
Thanks,
Manuel -
@manu_hna said in Futro s920 + DELL INTEL PRO X3959 crash on heavy load:
guide me on how to enable serial output?
See https://docs.netgate.com/pfsense/en/latest/hardware/console-types.html
The functionality and how to set it up will depend on details of what your hardware actually provides.
Netgate does provide more specific details on the hardware they sell. -
@patch
Hi Patch,
Thanks! I read the docs but I currently do not have a serial cable around :S
But isn't the output of serial console the same as sshing/reading DisplayPort output? When i ran this tests and were reading the screen output of the pfsense, nothing important was shown. It just went off and rebooted.
Please correct me If I am saying something crazy.Thanks,
Manuel -
It's same as a video console but you can't log that unlike a serial output. So if it spews a wall of text at you or reboots when you're not looking you won't be able to analyse it.
If nothing at all was shown, it just hard resets immediately, that's almost certainly a hardware issue.
Steve
-
@stephenw10
Hi Steve,
Yeah, nothing happens. It just reboots :sThe curious thing is that running iperf on Linuxstress does not trigger this hard reset.
Also, if the test is run just after a startup (cold system) sometimes the test does not fail. But it always does once the temps are high. Is the system maybe rebooting due to temperatures?
Is there anyway to check that with pfsense?Thanks,
Manuel -
Sure you can check the CPU temp. It will show on the dashboard if you enable the senor in Sys > Adv > Misc.
However it may be some other component overheating.
How are you testing with Linuxstress?
Steve
-
@stephenw10
I have seen that the crash is at +/- 80C° and 100% CPU usage.
I am testing with a live USB of Linux Stress.Thanks,
Manuel -
The CPU core is running at 80°C?
That seems hot, what CPU is it?How hot does it run at idle?
Is the system fan spinning up with temperature as expected?
Steve
-
@stephenw10
It runs at 60/70 normally. It does not have a fan. It is a passively cooled CPU.Thanks,
Manuel -
Hmm, well that's not so bad then. Anything passively cooled it expected to run hot. But that is still very hot. What is the actual CPU in that?
-
Ah, Ok I see it's an AMD GX-415GA?
Hmm, that appears to have a max rated temp of 90°C. In which case 80 seems waay too close!
It looks like it has a pretty big heatsink and the case is full of holes so the first thing I would do it just point a desk fan at it. It should make a pretty huge difference to the temps.
Steve
-
@stephenw10
Hi Steve, thanks for keeping up!
I have made some tests with low temps (60Cº) and it took more to fail but it did fail anyway. I don't think this one was due to the temps but who knows... I am really out of options rn.
Anything else I can test/investigate?Thanks,
Manuel -
@manu_hna @stephenw10
I tried the desk fan approach but no luck... The pfsense application still crashed.Thanks,
Manuel -
You set the sensor to amdtemp? Those are actually the reported CPU core values?
How old is that device? It looks like it could be almost 10 years in which case it could just be failing components if it's seen a lot of hours.
Steve
-
@stephenw10
Hi Steve,
Yes those are amd sensor. Maybe this unit has gone bad. Just as a test I tried same iperf with opnsense and openwrt and same results occurred...
I have ordered a new one and I'll see how this goes.Thanks, Manuel