2.7.0 PPPoE Continually Reconnecting
-
I came across this issue yesterday when I upgraded the backup of an HA pair to 2.7.0. My setup uses carp to handle the switchover from primary to backup, linking the pppoe interface to the carp VIP. When I tested failover, the backup enabled the pppoe interface and brought up the link but did not assign IPv4 local and remote addresses to the interface which meant that IPv4 was not routed and no IPv4 default route was installed. Sadly, I forgot to preserve the logs before downgrading back to 2.6.0.
Any ideas on workarounds for this?
-
That's not the same issue. The issue discussed in this thread is for VIPs on the PPPoE interface. It sounds like what you have is a PPPoE interface on a VIP.
You should start a new thread for that but I will say that what you're doing is an unsupported configuration. We have seen a few reports of it though.
Steve
-
@stephenw10 Thanks for the response - is there a recommended way of managing pppoe failover? I think I discovered my current approach from this thread: https://forum.netgate.com/topic/135904/configure-an-pppoe-on-an-carp-if/18.
-
Not really. HA requires static IPs for a supported setup.
-
This PPPoE + VIP issue is still present in 2.7.2/23.09.1. Until this issue is resolved, I'm stuck on 22.05 with no security patches. It's really unfortunate this isn't a higher priority issue.
-
Do you have a system log showing this happening? I though we had one here to reference but I'm not finding it now.
-
@stephenw10 I've attached a redacted version of my system.log file showing this.
"207.[REDACTED]" is the initial IP address my firewall receives before it applies one of my static IPs, which is "168.[REDACTED]."
Let me know if you need anything else!
system.log_redacted.txt -
Thanks!
-
@stephenw10 Any progress on this? I just looped back to update from 2.6 to 2.7.0 and have the exact same symptoms.
I now have 2.6 on my carp main and 2.7.0 on carp backup, so pretty easy to run tests if there's something you need. :)
(FWIW, we connect to CenturyLink, so the WAN PPPoE uses VLAN 201 -- thus this is not related to the VLAN=0 issue) -
Specifically the PPPoE with a VIP issue?
-
@stephenw10 yep. Continual PPPoE restarts...
I just did a smart diff of a good and bad log. FWIW...
(I've obfuscated private data. All obfuscations begin and end with "|", which is not found in the log.)A) This part's identical
[wan] Bundle: Interface ng0 created [wan_link0] Link: OPEN event [wan_link0] LCP: Open event [wan_link0] LCP: state change Initial --> Starting [wan_link0] LCP: LayerStart [wan_link0] PPPoE: Connecting to ''
B) This part is then seen in 2.6 (good connect) (BAD connect has a ~50 sec delay here)
PPPoE: rec'd ACNAME "||local ISP dom.ain||" [wan_link0] PPPoE: connection successful [wan_link0] Link: UP event [wan_link0] LCP: Up event [wan_link0] LCP: state change Starting --> Req-Sent [wan_link0] LCP: SendConfigReq #1 [wan_link0] PROTOCOMP [wan_link0] MRU 1492 [wan_link0] MAGICNUM 0x4c18d9e4 [wan_link0] LCP: rec'd Configure Request #37 (Req-Sent) [wan_link0] MRU 1492 [wan_link0] AUTHPROTO CHAP MD5 [wan_link0] MAGICNUM 0x48bb2420 [wan_link0] LCP: SendConfigAck #37 [wan_link0] MRU 1492 [wan_link0] AUTHPROTO CHAP MD5 [wan_link0] MAGICNUM 0x48bb2420 [wan_link0] LCP: state change Req-Sent --> Ack-Sent [wan_link0] LCP: rec'd Configure Ack #1 (Ack-Sent) [wan_link0] PROTOCOMP [wan_link0] MRU 1492 [wan_link0] MAGICNUM 0x||32 bit hex|| [wan_link0] LCP: state change Ack-Sent --> Opened [wan_link0] LCP: auth: peer wants CHAP, I want nothing [wan_link0] LCP: LayerUp [wan_link0] CHAP: rec'd CHALLENGE #159 len: 41 [wan_link0] Name: "JUNOS" [wan_link0] CHAP: Using authname "||user@dom.ain auth||" [wan_link0] CHAP: sending RESPONSE #159 len: 46 [wan_link0] CHAP: rec'd SUCCESS #159 len: 4 [wan_link0] LCP: authorization successful [wan_link0] Link: Matched action 'bundle "wan" ""' [wan_link0] Link: Join bundle "wan" [wan] Bundle: Status update: up 1 link, total bandwidth 64000 bps [wan] IPCP: Open event [wan] IPCP: state change Initial --> Starting [wan] IPCP: LayerStart [wan] IPCP: Up event [wan] IPCP: state change Starting --> Req-Sent [wan] IPCP: SendConfigReq #1 [wan] IPADDR 0.0.0.0 [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid [wan] IPCP: rec'd Configure Request #177 (Req-Sent) [wan] IPADDR ||ip.v.4.addr|| [wan] ||ip.v.4.addr|| is OK [wan] IPCP: SendConfigAck #177 [wan] IPADDR ||ip.v.4.addr|| [wan] IPCP: state change Req-Sent --> Ack-Sent [wan] IPCP: rec'd Configure Reject #1 (Ack-Sent) [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
C) This part is seen in both (next)
[wan] IPCP: SendConfigReq #2 [wan] IPADDR 0.0.0.0 [wan] IPCP: rec'd Configure Nak #2 (Ack-Sent) [wan] IPADDR ||my.static.ipv4.wan|| [wan] ||my.static.ipv4.wan|| is OK [wan] IPCP: SendConfigReq #3 [wan] IPADDR ||my.static.ipv4.wan|| [wan] IPCP: rec'd Configure Ack #3 (Ack-Sent) [wan] IPADDR ||my.static.ipv4.wan|| [wan] IPCP: state change Ack-Sent --> Opened [wan] IPCP: LayerUp [wan] ||my.static.ipv4.wan|| -> ||ip.v.4.addr|| [wan] IFACE: Up event [wan] IFACE: Rename interface ng0 to pppoe0 [wan] IFACE: Add description "WAN"
D) This final part is only in 2.7 bad...
Multi-link PPP daemon for FreeBSD process 98989 started, version 5.9 waiting for process 82135 to die... caught fatal signal TERM [wan] IFACE: Close event [wan] IPCP: Close event [wan] IPCP: state change Opened --> Closing [wan] IPCP: SendTerminateReq #4 [wan] IPCP: LayerDown [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address [wan] IFACE: Down event [wan] IFACE: Rename interface pppoe0 to pppoe0 [wan] IFACE: Set description "WAN" [wan] IPCP: rec'd Terminate Ack #4 (Closing) [wan] IPCP: state change Closing --> Closed [wan] IPCP: LayerFinish [wan] Bundle: No NCPs left. Closing links... [wan] Bundle: closing link "wan_link0"... [wan_link0] LCP: rec'd Terminate Request #28 (Opened) [wan_link0] LCP: state change Opened --> Stopping [wan_link0] Link: Leave bundle "wan" [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps [wan] IPCP: Close event [wan] IPCP: Down event [wan] IPCP: state change Closed --> Initial [wan_link0] LCP: SendTerminateAck #2 [wan_link0] LCP: LayerDown [wan_link0] PPPoE: connection closed [wan_link0] Link: DOWN event [wan_link0] Link: giving up after 0 reconnection attempts [wan_link0] LCP: Close event [wan_link0] LCP: state change Stopping --> Closing [wan_link0] LCP: Down event [wan_link0] LCP: LayerFinish [wan_link0] LCP: state change Closing --> Initial [wan_link0] Link: CLOSE event [wan_link0] LCP: Close event waiting for process 82135 to die... waiting for process 82135 to die... [wan] Bundle: Shutdown [wan_link0] Link: Shutdown process 82135 terminated web: web is not running
Then it immediately does another loop.
Perhaps a red herring, but the result of identical config on vtnet0 shows two differences between 2.6 and 2.7:
- 2.7 has flag LOWER_UP
- 2.7 has media "Ethernet autoselect"
Good 2.6
vtnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=900b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,LINKSTATE> ether ||my:ma:ca:dd:re:ss|| inet6 ||local fe80 ip6||%vtnet0 prefixlen 64 scopeid 0x1 media: Ethernet 10Gbase-T <full-duplex> status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
"Bad" 2.7
vtnet0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=900b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,LINKSTATE> ether ||my:ma:ca:dd:re:ss|| inet6 ||local fe80 ip6||%vtnet0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (10Gbase-T <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
-
Mmm, OK. Nothing I'm aware of. Let me see if anyone has anything internally...
-
Seems to be fixed in 24.03. At least in my test setup using an IPAlias VIP.
Is anyone here able to confirm that?
-
@stephenw10 Thanks. Running a very nonprofit, we can't afford the Plus upgrade, so I'll look forward to the next CE release. :)
-
Yup, just looking for confirmation. Though this test setup was hitting it that doesn't mean it's fixed for all cases necessarily. There was a bunch of work went into the gateway/WAN handling in 24.03 though.
If we can confirm the fix we know that will work in CE.