SG-5100 hanging on PPPoe WAN Reset Events



  • I have an unusual scenario i can't seem to identify occurring on my SG-5100 whenever the WAN - PPPOE link resets.

    SG-5100
    Bios Version: V1.10_5
    2.4.5-RELEASE (amd64) Note: Issue was present with 2.4.4 as well.

    Scenario:
    I have a VSDL2 modem connected to the 5100's WAN interface, in PPPOE mode, connected to Internet Service Provider. (So Modem is in bridge mode).

    Everything works just fine unless the WAN link goes down, as it often does. The modem does not go down and is rock solid in Bridge mode.
    However, the PfSense FW has major issues and is unable to bring back the PPPOE connection unless I reboot the device.

    The WebGui is extremely slow when this issue happens. The login for the Gui will load, but then to log in, after entering username/pwd, the firewall dashboard page does not show / load for at least 4 minutes after pressing "login". Sometimes it takes even longer. Modifying anything on the firewall at this time also takes 3-5 minutes if you go to apply and save a config item. The process will complete, but it really does take up to 5 min's to save.

    Im not sure if theres something in the configuration I need and am missing which is causing this. However, a reboot will bring the system back up just fine and everything returns to normal.

    I run 3 OpenVPN's.
    LAN -> VPNGW 1 + (Mix of rule/policy based routing via different VPN GW's for that specific rule as needed)
    DMZ -> VPN GW2

    FW DNS is set to VPN provider public DNS and also include Google DNS as 3rd entry.
    DNS Forwarding feature enabled on FW.

    Essentially, the logs indicate that the FW is unable to use the PPPOE which does appear to start ok when these issues occur. However, routing doesn't seem to happen correctly and the VPN tunnels never start. The FW itself is unable to ping the Internet Service Provider, so even though the PPPOE is up, IP layers obviously aren't starting.

    Logs : Read Bottom Up

    Jun 18 12:21:39
    php-fpm
    19589
    /rc.newwanip: IP Address has changed, killing states on former IP Address xx.xx.48.235.
    Jun 18 12:21:39
    php-fpm
    19589
    /rc.newwanip: Default gateway setting Interface WAN_PPPOE Gateway as default.
    Jun 18 12:21:39
    php-fpm
    19589
    /
    Reconfigured: new=0 old=1 dropped=0 (services)
    Jun 18 12:21:39
    xinetd
    10062
    readjusting service 6969-udp
    Jun 18 12:21:39
    xinetd
    10062
    Swapping defaults
    Jun 18 12:21:39
    xinetd
    10062
    Starting reconfiguration
    Jun 18 12:21:39
    php-fpm
    19589
    /rc.newwanip: rc.newwanip: on (IP address: xx.xx.58.134) (interface: WAN[wan]) (real interface: pppoe0).
    Jun 18 12:21:39
    php-fpm
    19589
    /rc.newwanip: rc.newwanip: Info: starting on pppoe0.
    Jun 18 12:21:38
    check_reload_status

    rc.newwanip starting pppoe0
    Jun 18 12:21:37
    php-fpm
    5156
    /rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Jun 18 12:21:37
    php-fpm
    5156
    /rc.newwanipv6: rc.newwanipv6: Info: starting on pppoe0.
    Jun 18 12:21:37
    check_reload_status

    Rewriting resolv.conf
    Jun 18 12:21:36
    php-fpm
    5156
    /rc.resolv_conf_generate: The command '/sbin/route delete -host 8.8.8.8' returned exit code '1', the output was 'route: route has not been found delete host 8.8.8.8 fib 0: not in table'
    Jun 18 12:21:36
    php-fpm
    5156
    /rc.resolv_conf_generate: The command '/sbin/route delete -host xx.xx.xx.36' returned exit code '1', the output was 'route: route has not been found delete host 203.12.160.36 fib 0: not in table'
    Jun 18 12:21:36
    php-fpm
    5156
    /rc.resolv_conf_generate: The command '/sbin/route delete -host xx.xx.xx.35' returned exit code '1', the output was 'route: route has not been found delete host xx.xx.xx.35 fib 0: not in table'
    Jun 18 12:21:36
    ppp

    [wan] xx.xx.58.134 -> 10.xx.xx.59
    Jun 18 12:21:36
    ppp

    [wan] IPCP: LayerUp
    Jun 18 12:21:36
    ppp

    [wan] IPCP: state change Ack-Sent --> Opened
    Jun 18 12:21:36
    ppp

    [wan] SECDNS xx.xx.xx.36
    Jun 18 12:21:36

    After WAN is up. (Not much to show, except VPN not starting and no L3 connectivity) .

    /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 192.168.2.1 - Restarting packages.
    /rc.newwanip: OpenVPN ID client1 PID 44339 still running, killing.
    Reconfigured: new=0 old=1 dropped=0 (services)
    rc.newwanip: OpenVPN ID client4 PID 74609 still running, killing
    OpenVPN PID written: 15138
    Reloading filter
    Starting reconfiguration
    Reconfigured: new=0 old=1 dropped=0 (services)
    Starting packages
    /rc.start_packages: Restarting/Starting all packages.

    Often, I do observe the following in the logs during the issue: I have played with custom system configurations which possibly have reduced these messages?

    sonewconn: pcb 0xfffff8000fb53d20: Listen queue overflow: 2 already in queue awaiting acceptance (2 occurrences)
    sonewconn: pcb 0xfffff8000f59d1e0: Listen queue overflow: 2 already in queue awaiting acceptance (6 occurrences)

    Also, possibly relevant to the WebGui slow response and freezing: - FW Dashboard time outs trying to load?

    nginx 2020/06/09 17:28:10 [error] 22976#100217: *1 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.1.28, server: , request: "POST /widgets/widgets/system_information.widget.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.1.254", referrer: "//192.168.1.254/"

    In any case, im at a loss as to what causes the issues experienced and im not seeing too much in the logs which gives me any idea on the root cause.
    Would be great to get some more eyes across this and any ideas on the cause.

    Many thanks,



  • Hello!

    Maybe related to your slow gui issue...

    https://forum.netgate.com/topic/154520/can-t-load-web-gui-when-wan-is-down

    ..with your primary dns on the other side of a vpn that isnt coming up.

    John



  • Just a quick update on this ongoing item.

    I have had less issues with the WAN PPPoE coming back up and obtaining a WAN IP address. Not exactly certain why, but I did go through all my settings and made sure that the FW local host can always access DNS - if the wan does happen to come up ok. However, I still don't yet see how DNS would not be working and be able to resolve as the FW itself is able to use the WAN for DNS purposes. I'm not 100% sure how Pfsense treats the DNS entires on the firewall as the first 2 entries are servers which are located on the VPNs. The 3rd DNS server is public on the WAN. Perhaps this might be causing issues if the FW is not cycling through the DNS servers in the list.

    Re: the GUI slowness.
    This is an ongoing and unusual issue:
    After the system being up for approx 10 days, this issue becomes highly noticeable, with the the GUI taking a good 3 -5 mins to log in and load the main page.
    Initially I thought this was only occurring when the WAN dropped and had issues obtaining a public IP. However, as my WAN link has been more stable recently, I can confirm that it seems to be not related specifically to the current WAN Connectivity. I disabled the Version /Firmware check for the GUI so theoretically that issue should be out of the picture. I can't see anything in the logs which would help isolate this issue. The system is only using 5% of RAM 20% disk and 2% CPU so everything seems ok there.

    It would be great to hear from anyone else with similar issues.

    Restarting the FW seems to resolve this for another 10 days or so.


Log in to reply