SG-5100 hanging on PPPoe WAN Reset Events
-
I have an unusual scenario i can't seem to identify occurring on my SG-5100 whenever the WAN - PPPOE link resets.
SG-5100
Bios Version: V1.10_5
2.4.5-RELEASE (amd64) Note: Issue was present with 2.4.4 as well.Scenario:
I have a VSDL2 modem connected to the 5100's WAN interface, in PPPOE mode, connected to Internet Service Provider. (So Modem is in bridge mode).Everything works just fine unless the WAN link goes down, as it often does. The modem does not go down and is rock solid in Bridge mode.
However, the PfSense FW has major issues and is unable to bring back the PPPOE connection unless I reboot the device.The WebGui is extremely slow when this issue happens. The login for the Gui will load, but then to log in, after entering username/pwd, the firewall dashboard page does not show / load for at least 4 minutes after pressing "login". Sometimes it takes even longer. Modifying anything on the firewall at this time also takes 3-5 minutes if you go to apply and save a config item. The process will complete, but it really does take up to 5 min's to save.
Im not sure if theres something in the configuration I need and am missing which is causing this. However, a reboot will bring the system back up just fine and everything returns to normal.
I run 3 OpenVPN's.
LAN -> VPNGW 1 + (Mix of rule/policy based routing via different VPN GW's for that specific rule as needed)
DMZ -> VPN GW2FW DNS is set to VPN provider public DNS and also include Google DNS as 3rd entry.
DNS Forwarding feature enabled on FW.Essentially, the logs indicate that the FW is unable to use the PPPOE which does appear to start ok when these issues occur. However, routing doesn't seem to happen correctly and the VPN tunnels never start. The FW itself is unable to ping the Internet Service Provider, so even though the PPPOE is up, IP layers obviously aren't starting.
Logs : Read Bottom Up
Jun 18 12:21:39
php-fpm
19589
/rc.newwanip: IP Address has changed, killing states on former IP Address xx.xx.48.235.
Jun 18 12:21:39
php-fpm
19589
/rc.newwanip: Default gateway setting Interface WAN_PPPOE Gateway as default.
Jun 18 12:21:39
php-fpm
19589
/
Reconfigured: new=0 old=1 dropped=0 (services)
Jun 18 12:21:39
xinetd
10062
readjusting service 6969-udp
Jun 18 12:21:39
xinetd
10062
Swapping defaults
Jun 18 12:21:39
xinetd
10062
Starting reconfiguration
Jun 18 12:21:39
php-fpm
19589
/rc.newwanip: rc.newwanip: on (IP address: xx.xx.58.134) (interface: WAN[wan]) (real interface: pppoe0).
Jun 18 12:21:39
php-fpm
19589
/rc.newwanip: rc.newwanip: Info: starting on pppoe0.
Jun 18 12:21:38
check_reload_statusrc.newwanip starting pppoe0
Jun 18 12:21:37
php-fpm
5156
/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
Jun 18 12:21:37
php-fpm
5156
/rc.newwanipv6: rc.newwanipv6: Info: starting on pppoe0.
Jun 18 12:21:37
check_reload_statusRewriting resolv.conf
Jun 18 12:21:36
php-fpm
5156
/rc.resolv_conf_generate: The command '/sbin/route delete -host 8.8.8.8' returned exit code '1', the output was 'route: route has not been found delete host 8.8.8.8 fib 0: not in table'
Jun 18 12:21:36
php-fpm
5156
/rc.resolv_conf_generate: The command '/sbin/route delete -host xx.xx.xx.36' returned exit code '1', the output was 'route: route has not been found delete host 203.12.160.36 fib 0: not in table'
Jun 18 12:21:36
php-fpm
5156
/rc.resolv_conf_generate: The command '/sbin/route delete -host xx.xx.xx.35' returned exit code '1', the output was 'route: route has not been found delete host xx.xx.xx.35 fib 0: not in table'
Jun 18 12:21:36
ppp[wan] xx.xx.58.134 -> 10.xx.xx.59
Jun 18 12:21:36
ppp[wan] IPCP: LayerUp
Jun 18 12:21:36
ppp[wan] IPCP: state change Ack-Sent --> Opened
Jun 18 12:21:36
ppp[wan] SECDNS xx.xx.xx.36
Jun 18 12:21:36After WAN is up. (Not much to show, except VPN not starting and no L3 connectivity) .
/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 192.168.2.1 - Restarting packages.
/rc.newwanip: OpenVPN ID client1 PID 44339 still running, killing.
Reconfigured: new=0 old=1 dropped=0 (services)
rc.newwanip: OpenVPN ID client4 PID 74609 still running, killing
OpenVPN PID written: 15138
Reloading filter
Starting reconfiguration
Reconfigured: new=0 old=1 dropped=0 (services)
Starting packages
/rc.start_packages: Restarting/Starting all packages.Often, I do observe the following in the logs during the issue: I have played with custom system configurations which possibly have reduced these messages?
sonewconn: pcb 0xfffff8000fb53d20: Listen queue overflow: 2 already in queue awaiting acceptance (2 occurrences)
sonewconn: pcb 0xfffff8000f59d1e0: Listen queue overflow: 2 already in queue awaiting acceptance (6 occurrences)Also, possibly relevant to the WebGui slow response and freezing: - FW Dashboard time outs trying to load?
nginx 2020/06/09 17:28:10 [error] 22976#100217: *1 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.1.28, server: , request: "POST /widgets/widgets/system_information.widget.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.1.254", referrer: "//192.168.1.254/"
In any case, im at a loss as to what causes the issues experienced and im not seeing too much in the logs which gives me any idea on the root cause.
Would be great to get some more eyes across this and any ideas on the cause.Many thanks,
-
Hello!
Maybe related to your slow gui issue...
https://forum.netgate.com/topic/154520/can-t-load-web-gui-when-wan-is-down
..with your primary dns on the other side of a vpn that isnt coming up.
John
-
Just a quick update on this ongoing item.
I have had less issues with the WAN PPPoE coming back up and obtaining a WAN IP address. Not exactly certain why, but I did go through all my settings and made sure that the FW local host can always access DNS - if the wan does happen to come up ok. However, I still don't yet see how DNS would not be working and be able to resolve as the FW itself is able to use the WAN for DNS purposes. I'm not 100% sure how Pfsense treats the DNS entires on the firewall as the first 2 entries are servers which are located on the VPNs. The 3rd DNS server is public on the WAN. Perhaps this might be causing issues if the FW is not cycling through the DNS servers in the list.
Re: the GUI slowness.
This is an ongoing and unusual issue:
After the system being up for approx 10 days, this issue becomes highly noticeable, with the the GUI taking a good 3 -5 mins to log in and load the main page.
Initially I thought this was only occurring when the WAN dropped and had issues obtaining a public IP. However, as my WAN link has been more stable recently, I can confirm that it seems to be not related specifically to the current WAN Connectivity. I disabled the Version /Firmware check for the GUI so theoretically that issue should be out of the picture. I can't see anything in the logs which would help isolate this issue. The system is only using 5% of RAM 20% disk and 2% CPU so everything seems ok there.It would be great to hear from anyone else with similar issues.
Restarting the FW seems to resolve this for another 10 days or so.
-
@scratchydog - Do you still have this issue? do you see high memory consumption after this happens as well ? Does it impact your internet speed ?
I am a noob to all this, and using a mini PC with multiple ethernet ports to run PfSense (nothing else is running at the moment), I have noticed the same issue with PPPoE WAN, which even showed that the connection was up for 15 hours when in fact I just rebooted PfSense.
From the beginning I had issues with the PPPoE connection. Therefore not sure if these are related at all. While I have used PfSense on another Old PC with current ISP without any issue, I could not get PPPoE to work until I assign a generated MAC to it. Once connected normally I get around 78-93Mbps down.. but after restart it drops back to 0.5Mbps - 3Mbps. However, when I changed the MAC for WAN again and reboot, everything seems to work great.
I have seen memory usage below 30% before this happened.. but eventually now it is using more than 70% of 6GB without much difference in network activity.[Btw, PfSense is running as a VM on Proxmox and NICs have been passed through]
Happy to see if there are any similarities in the issue and get a better outcome for all.
Many thanks!