Router slows to crawl every 3 days, needs reboot - suggestions?
I don't know where to start to diagnose this. Latest 64 bit version of pfsense. Completely default settings. Nothing extra installed. Just typical home consumer use, no servers no torrents nothing. FX-6200 system with all Intel ports.
Router works perfectly for 2 - 3 days. Fast speedtest.net ping and download speeds. 10ms ping, 933Mbs download 41Mbs upload. Then suddenly between day 3-4 my ping goes up to 20ms and downloads and uploads around 10Mbs or less. This is not gradual - the night before I'll be at full speed, the next day I'm slowed to a crawl. Reboot the router and it's instantly back to fast. This is not the modem or the switch or something else in the network, it's pfSense.
My old ASUS router never needed to be rebooted. Nothing like this ever happened before, all other equipment is the same as ever.
It has done this every 3-4 days for weeks (since I got the bright idea to build my own router), and I can't find anything wrong. It always takes at least 2.5 days to slow down, never happens sooner, and it has never been able to run more than 4 days without slowing down.
I see nothing in the logs or statuses indicating a problem. Nothing on the dashboard is meaningfully different when it's slow vs when it's fast. CPU and memory utilization unchanged, still running at the same speeds either way. All "statuses" that I look at are green, and nothing weird in the logs that I can see, although I am no expert at networking.
That said, I'm not much of a network person, I may not know where to look for errors. Before I just try a different motherboard - any thoughts? I have the latest BIOS, all Intel ports, only 2 memory sticks matched exactly and using default settings (that are confirmed correct for the MB/Processor). This machine was a reliable PC before being turned into a router. I have 25 years experience building PCs, this is not a case of forgetting to plug in the heatsink fan or something obvious like that.
And this is installed on bare metal, no VM. It's as default as default gets.
Are we talking 2.4.5 or 2.4.5-p1 here?
Latest version as I said, the -p1. But it was the same before the p1 update came out.
@netblues it's p1, but it was the same before updating to P1. Initial install was before P1 came out but I installed the update through the notification on the dashboard.
Strange as it is, lets rule out a few things.
use of pfblocker-ng?
@netblues Honestly I don't even know what those things are. I literally installed the software, assigned ports from the command line, and let it run. That's what I mean by default - like nothing but default install. I don't even know what else I could do. It works perfectly until it doesn't. I log into the dashboard just to try to figure out what is going wrong, no configuration changes.
Just now I went into the bios looking for power management settings that might somehow be affecting performance over time. Turned off "Cool & Quiet" "Application power management" and "Core turbo boost" - if that makes any difference I'll report in 4 days or so. In the mean time I'm open to suggestions.
Comcast, Arris SB8200 modem. The modem connects to the router by DHCP and there is no way to change that. There are no relevant settings available in the modem admin page.
I am really not a networking person but I'm thinking what if the modem wants to renew its lease every 3 days and something is going wrong in that process? The 3 days thing just seems so weird. Any thoughts on that?
And finally, is there any way I could just script/automate the router to automatically reboot every day at 3am? I would be fine with that if there is no other solution.
@feedyourtv You can try disabling and re enabling the wan from the web interface and see if this "fixes" the issue
there is also a command for that
/etc/rc.linkup stop wan (replace with your interface)
/etc/rc.linkup start wan (replace with your interface)
adding the cron package allows you to scedule this
a resstart is also possible with the same method
Rest assured, there are thousands out there with exact configurations like yours and they work fine. We just need to find the exact cause in your situation.
Gertjan last edited by
@feedyourtv When it happens, check Status > Monitoring, select Quality (ping test) and WAN.
Can you when it happens, the ping raises ?
have also a look at Status > Monitoring select System and then the four pages Memory, Processor, States, etc.
Also : check the logs under Status > System Logs > System : something 'special' around that moment ?
OK so my BIOS changes to power options did nothing, reboot was needed in a little over 2 days. I then tried disable hardware checksum offload which also did nothing. I'm pretty close to trying a different motherboard, but I'll at least go through one more cycle to see if I can find the problem. So far nothing I have tried will restore performance, short of a complete reboot. However I have not tried the latest suggestions.
Current uptime is 1 day 16 hours and it runs as fast as if it was just rebooted - no sign of a gradual slowing.
I will try the above suggestions the next time it slows - should be the in the next 24 to 36 hours. Last reboot I made no configuration changes, I just needed it to work at that moment.
OK well nothing fixed the problem so I installed cron, quickly found a guide on how to schedule nightly reboots, and now the system is working flawlessly. I'm asleep when it reboots so the downtime is not an issue. After a couple weeks of no problems, we seem to be stable every day for 24 hours, and that's all I need.
Running cron daily has revealed possibly the real issue. I am now getting crash reports fairly regularly, but not every day. It only happens when cron reboots the system. When cron runs, it often throws a page fault - according to the logs. There may be some issue that is not being revealed until cron tries to reboot. I had never seen crash report from manual rebooting however. So who knows. It always reboots successfully, so this is not a problem in my book.
So it's possible there is something wrong with the RAM or motherboard, but based on my usage of this system in the past with no issues as a windows PC, I think it's much more likely that BSD or pfsense doesn't handle an uncommon and unpopular processor like the FX-6200 very well. And ASROCK motherboards are always suspect in my book. But we'll never know. I'm not exactly impressed with the stability of this software, I don't think it's as good as some people seem to think it is, but it's working and performance is fantastic.
I now have link aggregation running off my modem getting 1200Mb/s downloads pretty much all the time. That's why I was trying to run pfsense in the first place, so it is doing its job. That's 28% faster than the max I could get with a single connection, and it really can pull those sustained download speeds from Steam and other fast servers. Pretty sweet.
@feedyourtv Most probably a hardware issue. The fault will propagate sooner or later.
Usually is due to faulty decoupling capacitors on the motherboard.
To say that ashrock boards are problematic is as invalid as by saying that windows never blue screen.
If it was a software issue, this board would be full of complaints, rest assured.
Have fun, as long as it works out for you this way :)