23.01 install results in no internet

stephenw10

Is there anything in the main system log starting that?

It's not quite the same issue. OP's log here shows the terminate received via LCP and yours doesn't?

bert64

@stephenw10 Not that i've seen, i had to revert to 22.05 to get back online.. Is it possible to mount the alternative bootenv to retrieve the logs?

stephenw10

Not dircetly that I'm aware of, you'd need to boot into it.

bert64

@stephenw10 what am i thinking.. i have a syslog server.
It seems to spawn rc.newwanip for pppoe0, then 2 seconds later a new instance of mpd5 is spawned which kills the old one:

Mar 28 22:32:55 FW1 charon[26243]: 11[KNL] x.x.x.x appeared on pppoe0
Mar 28 22:32:55 FW1 check_reload_status[388]: rc.newwanip starting pppoe0
Mar 28 22:32:56 FW1 dpinger[37838]: exiting on signal 15
Mar 28 22:32:56 FW1 dpinger[37566]: exiting on signal 15
Mar 28 22:32:56 FW1 check_reload_status[388]: Configuring interface opt5
Mar 28 22:32:57 FW1 dpinger[38249]: exiting on signal 15
Mar 28 22:32:57 FW1 dpinger[36458]: exiting on signal 15
Mar 28 22:32:57 FW1 dpinger[35464]: exiting on signal 15
Mar 28 22:32:57 FW1 dpinger[12865]: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr fe80::a66c:2aff:fe77:3300%pppoe0  bind_addr fe80::ec4:7aff:fe19:898c%pppoe0  identifier "WAND_DHCP6 "
Mar 28 22:32:57 FW1 dpinger[13165]: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 2001:XXX::73  bind_addr 2001:XXX::240  identifier "RTFW "
Mar 28 22:32:57 FW1 dpinger[13674]: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 2001:XXX::6464  bind_addr 2001:XXX::240  identifier "Jool "
Mar 28 22:32:57 FW1 dpinger[14103]: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 2001:XXX::777  bind_addr 2001:XXX::240  identifier "CISCO_3750 "
Mar 28 22:32:57 FW1 dpinger[14579]: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 2001:XXX::100  bind_addr 2001:4d48:ad57:40a::240  identifier "LIABFW "
Mar 28 22:32:57 FW1 dhcp6c[40213]: Sending Information Request
Mar 28 22:32:57 FW1 dhcp6c[40213]: set client ID (len 10)
Mar 28 22:32:57 FW1 dhcp6c[40213]: set elapsed time (len 2)
Mar 28 22:32:57 FW1 dhcp6c[40213]: send information request to ff02::1:2%pppoe0
Mar 28 22:32:57 FW1 dhcp6c[40213]: reset a timer on pppoe0, state=INFOREQ, timeo=2, retrans=3617
Mar 28 22:32:57 FW1 charon[26243]: 16[KNL] XXX disappeared from pppoe0
Mar 28 22:32:57 FW1 ppp[20529]: Multi-link PPP daemon for FreeBSD
Mar 28 22:32:57 FW1 ppp[20529]:
Mar 28 22:32:57 FW1 ppp[20529]: process 20529 started, version 5.9
Mar 28 22:32:57 FW1 ppp[20529]: waiting for process 50355 to die...```

stephenw10

Seems like a race condition. What's opt5? How is it configured?

bert64

@stephenw10 opt5 is the pppoe interface, config is a little convoluted...

There is a wan interface (igb2) which is connected to a switch, there is a tagged vlan interface (igb2 vlan 80), then there is a carp vip on that interface with the ppp session bound to it for failover.
Taking the backup firewall offline and binding pppoe directly to igb2.80 made no difference.

ppp configuration is mostly default - username, password, null service name, mtu set to 1500 to use baby jumbo frames.

Interface configured legacy ip using pppoe, and then dhcpv6 for ipv6 with a /56 prefix delegation set to "information only" as it's static and i don't need to dynamically apply it to other interfaces.

stephenw10

Hmm, that should be OK. Though technically PPPoE is not supported in HA.

How is igb2 configured? In that sort of setup I would want to see it only static or no IP.

bert64

@stephenw10 no ip on igb2..
The HA with PPPoE works well with 22.05 and earlier versions.

stephenw10

Hmm, is igb2 assigned/enabled?

bert64

@stephenw10 yeah igb2 is assigned and active.
The same config works with 22.05, performing an upgrade to 23.01 with no config changes triggers the problem.
The logs show the PPP session coming up, and then a second instance of mpd5 is spawned which causes the first one to terminate.
It seems mpd5 is always started with the -k option:
-k, --kill Kill running mpd process before start
just trying to work out why a second instance is getting spawned...

stephenw10

Yes, something is causing that and my hunch is that it's because igb2 is assigned such that the vlan and pppoe subinterfaces are being restarted.
Is there any reason igb2 is assigned? Can you unassign it as a test?

bert64

@stephenw10 It's just assigned as WAN with no IP configuration...
I will give it another try tonight, i'm working from home and this device controls the connection so can't take it down during the day.

stephenw10

Right but it doesn't need to be assigned. And by having it assigned will mean that a bunch more scripts get triggered when the VLAN or PPPoE link on the VLAN get bounced.

bert64

@stephenw10 unassigning the parent interface made no difference, still the endless loop of connecting and killing mpd5..

stephenw10

Hmm, and this happens at the first time it tries to connect? Or at the first reconnect?

bert64

@stephenw10 as soon as it boots, and does the same thing if i disable and re-enable the interface too.

stephenw10

Hmm, there must be something different there with your setup, I'm not seeing anything like that in PPPoE links here.
Are you able to test it without running it on a CARP VIP?

bert64

@stephenw10 Makes no difference removing CARP and binding it direct to the vlan interface..

stephenw10

Hmm, are you able to test a basic PPPoE connection only there?

I'm not able to replicate what you're seeing. It must be part of your config triggering it somehow.

bert64

@stephenw10 I'll try making a fresh install on a new device with only the pppoe config and nothing else, see how it behaves...