WAN going UP and DOWN in CE 2.7
-
@Gertjan
Now it's getting weirder. I made the following changes about 8 hours ago:1.) Removed iperf and nmap packages.
2.) Removed every feed in pfblocker that had errors in pfblockerng.log (curl error etc.) and forced update.I haven't had a WAN interruption since. Thus, I also had no reloads of pfb_dnsbl and pfb_filter.
Very strange, but for now it seems to work. IDK if this is a 'solution' or just a coincidence, though. I will observe and report back if it's getting worse again,
Mario.
-
@emefff said in WAN going UP and DOWN in CE 2.7:
Very strange, but for now it seems to work
I'm not surprised at all.
Because I've seen the same behavior.I've been using @work a big ex server PC for my pfSense needs. Loads of G RAM, Xeon processor etc.
Still, the PHP processes doing all the GUI lifting, and all the PHP scripts like the ones used by pfBlockerng, are rather limited in their max allowed memory (RAM) usage :I keep this list :
to a bare minimum.
As soon as there are not ten or hunderds of thousands, but millions of "DNSBL lines" (host name) every "Firewall > pfBlockerNG > Update", do a Force All Run takes far to much time.
For me, 10, 20 seconds is a max. This is also the time that DNS is unavailable to the system and all connected networks.
My 'cron' = auto pfBlockerNG update task is set to one a day, and files are checked for possible updates ones a week.I'm using a Netgate 4100 max version, it has 4 Gbytes of memory and a 128 Gbytes disk. Still, it's to easy to bring pfBlockerng to a crawl when using many or big dnsbl feeds.
PHP just isn't the right tool to do that much file parsing.Btw : I'm using pfBlockerng "Unbound Python Mode" as this isn't actually an option anymore.
"unbound mode" uses PHP to do the DNS parsing .... -
@Gertjan
Hello,
I just wanted to report back my findings. 20-30 hours after my last post above, the fun started again. Hotplug events were not that frequent, but increased with running time of the appliance.
DNSBL was yellow in above graphics and I drew a connection to that. If it was out of sync, everything got a lot worse, complete LOC included.Today, a few hours ago, I also switched to 'unbound python mode' instead of the other mode. Since then, I had not a single saving event of pfblocker (these occured very often in the recent past after switching to CE2.7) and also no hotplugging event of the WAN.
I was too optimistic last time, but with unbound python it looks much better.
Mario.
-
Hello again,
two days ago, I did a fresh install of pfSense CE 2.7, to speed it up I did a restore from a backup (Backup & Restore option with .xml).Sadly, the hotplugging events on the i350 card cam back. So it seems, a fresh install does not help.
The only thing I can do to make my life easier (the frequent LOC make me want to pull my hair out) is to reboot in the morning.
Today, for the first time, I chose to 'Reroot' because rebooting also often hangs and I do not want to stress that flimsy button on my Prodesk 400 (the button is basically complete junk).
However, pfSense showed me a very informative error report (well, surely informative for an expert, but maybe not for me) that I attached.I hope someone with more knowledge can please tell us what it is about. I assume it has something to do with the hotplugging events of my NIC,
thanks in advance,
Mario.
-
@emefff said in WAN going UP and DOWN in CE 2.7:
I did a fresh install of pfSense CE 2.7, to speed it up I did a restore from a backup (Backup & Restore option with .xml).
You've managed to make your pfSense identical to the version you had before the re install.
At at bit level.
IT people would say : you've done a NOP or No Operation.
Dong nothing would have the same result.@emefff said in WAN going UP and DOWN in CE 2.7:
Today, for the first time, I chose to 'Reroot' because rebooting also often hangs ....
Wait.
Did you saw it booting ?
You have a screen : you can see the boot process.
When the system wakes up, it knows nothings about disks NIC's or what OS it will be using.
Then the BIOS locates a bootable drive, reads the boot ecort and loads whatever is mentioned over there.
This will load the FreeBSD kernel.
The FreeBSD kernel doesn't know it will be a 'pfSense' system.
It will enumerates all the hardware it finds in the system.
If booting stops in that process, the issue is not 'pfSense' but the kernel having a hard time with a device, like a NIC driver.The report tells me :
You've demanded a system reboot.
Then the kernel hits a VM (virtual memory) fault.
That's not a pfSense thing, for me, that's the thin boundary between the kernel and your hardware.Try another system ^^
Btw : to motivate you : I'm using pfSense for more over a decade, never had to hardware reset it. Not sure my device has a reset button.
Btw : resetting a device like pfSense is (can be) bad for the file system. Press the reset button on your windows PC several times, and it won't boot anymore neither.Also : install snort only if you are sure your device is rock solid for the last past xx month.
This goes for any very resource demanding software.Again : these words are mine - I'm just another pfSense user.
-
@Gertjan
It wasn't clear before the fresh install, that it is the same. If it was so perfect, how come it's problematic since 2.7? And no, it is not the same bit for bit. There are no logs and other files that change during operation etc. Also, files are in different places of the SSD.No I did not see it booting.
It's not my hardware what is at fault, it's CE 2.7. My hardware worked for years with 2.5 and 2.6 and I did not change any major stuff in 2.7.
I won't try another system, but if I do, it won't be pfSense.
Mario.
-
@emefff said in WAN going UP and DOWN in CE 2.7:
how come it's problematic since 2.7?
Can't tell. Most probably the (a new) NIC driver included in the kernel.
Previous pfSense uses FreeBSD 12.x, pfSense uses FreeBSD 14.
Like Windows 10 before and now Windows 11 : there are differences ^^@emefff said in WAN going UP and DOWN in CE 2.7:
And no, it is not the same bit for bit. There are no logs and other files that change during operation etc. Also, files are in different places of the SSD.
I meant : running the same 'code'.
-
@emefff
This seems to be something I have been struggling with for 2 weeks and finally seems to have made it. The first thing to do is System/RoutingGateways/Edit > Show Advanced Options > Packet Loss Thresholds to change the default value from 10-20 to 10-75.
The second thing I did was remove the interface and rebind it (I had to fix it wherever nat, openvpn, etc. were indicated, a lot)
the third thing I did was turn off the machine, completely de-energize it and make sure until the lights on the network card stopped blinking (wol) I waited another minute and only after that turned it on.
As a result, the problem went away... https://docs.netgate.com/pfsense/en/latest/hardware/tune.html There are a lot of tips here, BUT, I suggest doing this only if there are problems! Therefore, I advise you to remove all the tuning that you could do. -
@Stef93 said in WAN going UP and DOWN in CE 2.7:
@emefff
This seems to be something I have been struggling with for 2 weeks and finally seems to have made it. The first thing to do is System/RoutingGateways/Edit > Show Advanced Options > Packet Loss Thresholds to change the default value from 10-20 to 10-75.
The second thing I did was remove the interface and rebind it (I had to fix it wherever nat, openvpn, etc. were indicated, a lot)
the third thing I did was turn off the machine, completely de-energize it and make sure until the lights on the network card stopped blinking (wol) I waited another minute and only after that turned it on.
As a result, the problem went away... https://docs.netgate.com/pfsense/en/latest/hardware/tune.html There are a lot of tips here, BUT, I suggest doing this only if there are problems! Therefore, I advise you to remove all the tuning that you could do.it takes a long time to explain these points, but these points were completed at intervals of 2-3 days, for example, 10-20 changed to 10-75 in order to see the losses, according to the standard> 20% losses, he considers the gateway lost
-
Hello,
I tried the threshold change from 10-20 to 10-75 and increased the MBUFS, nothing changed.
What did seem to get rid of the hotplugging events (at least for now 24h without any hotplugging event, which was NEVER the case since 2.7) is changing from snort to suricata. The funny thing is: sometime in CE 2.6 I changed from suricata to snort because of trouble with suricata.
I am confident this is solved, but will report back if I was wrong,
thanks everybody,
Mario.
-
Hi,
I have been running this config without unplanned hotplugging events of NIC for more than a week now. It was definitely the Snort package that caused these events,
Mario.
-
Similar problem in CE2.7.2 in AGO 2024