PPPoE link dying after 2.4.4_2 update



  • Re: PPPoE disconnects requiring reboot

    Not sure if its related to the above post but after updating from the latest 2.4.3 to 2.4.4_2 my PPPoE link is intermittently crashing pretty regularly. There aren't any logs under the PPP tab in system logs, just the interface being brought down when I restart the connection.

    I don't have an idle timeout configured (there is generally always traffic going over this connection) and the only things thats changed is upgrading.

    I've contacted my ISP and they've confirmed nothing is wrong on there end so I'm not really sure where to start.


  • Netgate Administrator

    You see anything logged anywhere when it goes down?

    Lets see the logs.

    It's not a general issue. There were some with PPPoE and VLANs in 2.4.4 but those should be fixed in p2.

    Steve



  • Hi! Thanks for replying. I've checked the logs and there isn't anything logged, the only thing that happens in them is the interface coming down and PPPoE coming backup and establishing a new connection.

    Is there a way to add debugging to the PPPoE interface or anything?



  • @Mooash said in PPPoE link dying after 2.4.4_2 update:

    only thing that happens in them is the interface coming down

    Goes down logically - or physically ?
    The latter : normal that PPPOE goes flat out : no link is no connection

    I've used PPPOE for years and I had all the details in the PPP log.
    (although I'm sending all my logs to a remote syslog server : al the details are there all the time)

    Lets see the logs .....



  • @Gertjan here's the last time it happened:

    dpinger

    Apr 9 21:12:24 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 192.168.xx.x bind_addr 192.168.xx.x identifier "LANGW "
    Apr 9 21:12:24 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 10.20.xx.xxx bind_addr xxx.xxx.xx.xxx identifier "WAN_PPPOE "
    Apr 9 21:12:04 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 192.168.xx.x bind_addr 192.168.xx.x identifier "LANGW "
    Apr 9 21:12:04 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 10.20.xx.xxx bind_addr 202.xxx.xx.xxx identifier "WAN_PPPOE "
    Apr 9 21:01:40 	dpinger 		WAN_PPPOE 10.20.xx.xxx: Clear latency 334382us stddev 459973us loss 0%
    Apr 9 21:00:16 	dpinger 		WAN_PPPOE 10.20.xx.xxx: Alarm latency 507440us stddev 484840us loss 0%
    Apr 9 21:00:01 	dpinger 		WAN_PPPOE 10.20.xx.xxx: Clear latency 326196us stddev 220164us loss 0% 
    

    rc.gateway_alarm

    Apr 9 21:00:16 	rc.gateway_alarm 	90058 	>>> Gateway alarm: WAN_PPPOE (Addr:10.20.xx.xxx Alarm:1 RTT:507.440ms RTTsd:484.840ms Loss:0%) 
    

    And heres me restarting the PPPOE interface (disconnecting and reconnecting ~12 minutes later), before that nothing in the ppp logs.

    Apr 9 21:11:55 	ppp 		caught fatal signal TERM
    Apr 9 04:05:05 	ppp 		[wan] IFACE: Rename interface ng0 to pppoe1 
    

    I've tried talking to my ISP and they're convinced they aren't even seeing me bringing anything down.



  • @Mooash said in PPPoE link dying after 2.4.4_2 update:

    they aren't even seeing me bringing anything down.

    Whooo.
    If they can't see you 'disconnecting' this means that you can can consider the connection as 'no existent'.
    This is what dpinger is telling you.?


  • Netgate Administrator

    Hmm, so nothing at all in the ppp logs. And it looks like there was one minute of latency over the 500ms level but that cleared. So it is still pinging the gateway at that time and getting responses?

    Steve



  • @stephenw10 yeah, its weird. Existing connections (TCP or UDP) for things like Twitch streams, SSH connections seem to stay up and working fine but any new connections don't succeed.

    Restarting the ppp connection brings everything back up and working 100% of the time. Its kept happening the last few days and still nothing in the ppp logs. Is there any way to add some more debugging to the interface/session or anything?

    This connection has been stable for ~3-4 years before this latest update.


  • Netgate Administrator

    Seem more like an inability to open new firewall states on WAN. How many states do you have open? It may be exhausting the limit if you are running some application that really eats states.
    I would expect to see some error logged if it couln'a open states though.

    Steve



  • @stephenw10 any idea where that'd be logged? Can't see anything in the system logs or firewall logs or anything.


  • Netgate Administrator

    It would be in the system log if it was anywhere. Unless it stops loggin which can happen with a failing drive for example.

    Do you still have access to the gui when this happens? Can you still open new states between other local interfaces?

    Steve



  • @stephenw10 I can't open anything WAN bound but opening a connection to the router itself works completely fine, its how I bounce the PPPoE interface to get everything working again.

    But it does die over multiple VLANs, yeah.


  • Netgate Administrator

    Ok, but all the connections that fail were using WAN? You can still open new connections between internal subnets?

    Steve



  • I haven't tried across subnets yet, I'll have a look at that next time it dies. Thanks again for everyones help so far.


Log in to reply