23.01 install results in no internet
-
@stephenw10 opt5 is the pppoe interface, config is a little convoluted...
There is a wan interface (igb2) which is connected to a switch, there is a tagged vlan interface (igb2 vlan 80), then there is a carp vip on that interface with the ppp session bound to it for failover.
Taking the backup firewall offline and binding pppoe directly to igb2.80 made no difference.ppp configuration is mostly default - username, password, null service name, mtu set to 1500 to use baby jumbo frames.
Interface configured legacy ip using pppoe, and then dhcpv6 for ipv6 with a /56 prefix delegation set to "information only" as it's static and i don't need to dynamically apply it to other interfaces.
-
Hmm, that should be OK. Though technically PPPoE is not supported in HA.
How is igb2 configured? In that sort of setup I would want to see it only static or no IP.
-
@stephenw10 no ip on igb2..
The HA with PPPoE works well with 22.05 and earlier versions. -
Hmm, is igb2 assigned/enabled?
-
@stephenw10 yeah igb2 is assigned and active.
The same config works with 22.05, performing an upgrade to 23.01 with no config changes triggers the problem.
The logs show the PPP session coming up, and then a second instance of mpd5 is spawned which causes the first one to terminate.
It seems mpd5 is always started with the -k option:
-k, --kill Kill running mpd process before start
just trying to work out why a second instance is getting spawned... -
Yes, something is causing that and my hunch is that it's because igb2 is assigned such that the vlan and pppoe subinterfaces are being restarted.
Is there any reason igb2 is assigned? Can you unassign it as a test? -
@stephenw10 It's just assigned as WAN with no IP configuration...
I will give it another try tonight, i'm working from home and this device controls the connection so can't take it down during the day. -
Right but it doesn't need to be assigned. And by having it assigned will mean that a bunch more scripts get triggered when the VLAN or PPPoE link on the VLAN get bounced.
-
@stephenw10 unassigning the parent interface made no difference, still the endless loop of connecting and killing mpd5..
-
Hmm, and this happens at the first time it tries to connect? Or at the first reconnect?
-
@stephenw10 as soon as it boots, and does the same thing if i disable and re-enable the interface too.
-
Hmm, there must be something different there with your setup, I'm not seeing anything like that in PPPoE links here.
Are you able to test it without running it on a CARP VIP? -
@stephenw10 Makes no difference removing CARP and binding it direct to the vlan interface..
-
Hmm, are you able to test a basic PPPoE connection only there?
I'm not able to replicate what you're seeing. It must be part of your config triggering it somehow.
-
@stephenw10 I'll try making a fresh install on a new device with only the pppoe config and nothing else, see how it behaves...
-
@stephenw10 So strange thing is a fresh build with no other config doesn't have this problem, but going through and stripping out all the packages and routes out of the existing setup doesn't solve it either...
Seems to be hitting the following in rc.newwanip:
/* * NOTE: Take care of openvpn, no-ip or similar interfaces if you generate the event to reconfigure an interface. * i.e. OpenVPN might be in tap mode and not have an ip. */ if ($curwanip == "0.0.0.0" || !is_ipaddr($curwanip)) { if (substr($interface_real, 0, 4) != "ovpn") { if (!empty($config['interfaces'][$interface]['ipaddr'])) { log_error("rc.newwanip: Failed to update {$interface} IP, restarting..."); send_event("interface reconfigure {$interface}"); return; } } }
If i comment out the send_event line the ppp session doesn't get constantly restarted, but it doesn't add routes for it either so it's not fully working. I'm not sure where the send_event code is going so i've not had chance to debug further.
-
Exactly the same WAN/PPPoE config in the clean install?
This feels like two interfaces interacting like the ppp on the vlan as we speculated.
-
@stephenw10 the fresh install was also using vlan tagging, and switching the existing firewall to use the native interface instead of a tagged interface made no difference... perhaps its a combination of lots of things slowing it down enough to cause a race condition...
the fresh install was a vm on a fairly powerful hypervisor, the existing one is an older atom board so there's a significant performance disparity.do you happen to know what code is executed when you send the "interface reconfigure" event?
-
I don't know that without digging into it. But it does vary depending on what the interface is.
It could be a race condition, yes. That's going to be considerably faster on a VM with one interface than an Atom. Check the logs from each. Do you see entries in a different order?
-
@stephenw10 so from what i can see, on the working system a log entry comes up:
May 23 21:18:31 FW1 php-fpm[74460]: /rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAND[opt5]) (real interface: pppoe0).
whereas on the other host:
May 23 21:23:41 FW2 php-fpm[854]: /rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAND[opt5]) (real interface: pppoe0).
so neither find_interface_ip nor get_interface_ip is returning an address, but can't figure out why not..
If i hard code it to return my static address whenever the interface is pppoe0, it comes online and everything else seems to work, but obviously this is quite a nasty hack.