if_pppoe with frequent connection losses due to ISP connection making firewall unstable
-
@stephenw10
so basically, due to frequent connection losses and restores, it creates a race condition when services are getting restarted? -
Potentially.
Try doing one manual reconnect and see what is logged until it's stable again. How long does it spend reloading stuff.
-
@stephenw10 I cannot reliably do this test right now. It is losing connection almost every 20 seconds. Did the new version got tested for such extreme cases?
-
It should be fine. Or at least no worse than mpd5/netgraph. Bouncing the WAN every 20s is going to be pretty disruptive on a default install. With all the services and additional sub-interfaces you have it 's going to be a lot for the firewall to do. If it takes longer than 20s to reload all the tunnels and services then it could be in a continuous churn with high load.
-
@stephenw10 I have disabled most of the services and some gateway monitors but this still did not help.
It appears that after getting this screen a simple CTRL+C brings the firewall back online which indicates the system is getting stuck on some process.
Unless I catch this immediately, I dont think it is possible to trace the logs.
Is there a way to increase the log size or log rotation so that the next time this happens we can trace it? At this point, I am pissed at my ISP :D.
-
@stephenw10
and there are some new interesting logs with unboundJul 15 14:23:57 FIREWALL php-fpm[92187]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1752578637] unbound[61748:0] error: bind: address already in use [1752578637] unbound[61748:0] fatal error: could not open ports'
-
How long do your logs last? They should just rotate and store unless you're running ram disks. You can only display up to 2000 lines in the webgui but if you look in /var/log directly there should be far more.
If you hit that at the console again try hitting
ctl+t
beforectl+c
. That should show what process is running and/or stuck. -
That unbound log is quite common when it restarts. It can try to start that before the previous process has stopped. It shouldn't cause a problem.
-
@stephenw10
unfortunately, it is not enough. In a couple of hours it is filling the logs up to the system.log.6.See my first post about it.
-
You can set the size it rotates at and the number of files to retain in the log settings at Status > Logs > Settings. As long as you have the space you should be able to increase it.