23.01 install results in no internet
-
Yes, something is causing that and my hunch is that it's because igb2 is assigned such that the vlan and pppoe subinterfaces are being restarted.
Is there any reason igb2 is assigned? Can you unassign it as a test? -
@stephenw10 It's just assigned as WAN with no IP configuration...
I will give it another try tonight, i'm working from home and this device controls the connection so can't take it down during the day. -
Right but it doesn't need to be assigned. And by having it assigned will mean that a bunch more scripts get triggered when the VLAN or PPPoE link on the VLAN get bounced.
-
@stephenw10 unassigning the parent interface made no difference, still the endless loop of connecting and killing mpd5..
-
Hmm, and this happens at the first time it tries to connect? Or at the first reconnect?
-
@stephenw10 as soon as it boots, and does the same thing if i disable and re-enable the interface too.
-
Hmm, there must be something different there with your setup, I'm not seeing anything like that in PPPoE links here.
Are you able to test it without running it on a CARP VIP? -
@stephenw10 Makes no difference removing CARP and binding it direct to the vlan interface..
-
Hmm, are you able to test a basic PPPoE connection only there?
I'm not able to replicate what you're seeing. It must be part of your config triggering it somehow.
-
@stephenw10 I'll try making a fresh install on a new device with only the pppoe config and nothing else, see how it behaves...
-
@stephenw10 So strange thing is a fresh build with no other config doesn't have this problem, but going through and stripping out all the packages and routes out of the existing setup doesn't solve it either...
Seems to be hitting the following in rc.newwanip:
/* * NOTE: Take care of openvpn, no-ip or similar interfaces if you generate the event to reconfigure an interface. * i.e. OpenVPN might be in tap mode and not have an ip. */ if ($curwanip == "0.0.0.0" || !is_ipaddr($curwanip)) { if (substr($interface_real, 0, 4) != "ovpn") { if (!empty($config['interfaces'][$interface]['ipaddr'])) { log_error("rc.newwanip: Failed to update {$interface} IP, restarting..."); send_event("interface reconfigure {$interface}"); return; } } }
If i comment out the send_event line the ppp session doesn't get constantly restarted, but it doesn't add routes for it either so it's not fully working. I'm not sure where the send_event code is going so i've not had chance to debug further.
-
Exactly the same WAN/PPPoE config in the clean install?
This feels like two interfaces interacting like the ppp on the vlan as we speculated.
-
@stephenw10 the fresh install was also using vlan tagging, and switching the existing firewall to use the native interface instead of a tagged interface made no difference... perhaps its a combination of lots of things slowing it down enough to cause a race condition...
the fresh install was a vm on a fairly powerful hypervisor, the existing one is an older atom board so there's a significant performance disparity.do you happen to know what code is executed when you send the "interface reconfigure" event?
-
I don't know that without digging into it. But it does vary depending on what the interface is.
It could be a race condition, yes. That's going to be considerably faster on a VM with one interface than an Atom. Check the logs from each. Do you see entries in a different order?
-
@stephenw10 so from what i can see, on the working system a log entry comes up:
May 23 21:18:31 FW1 php-fpm[74460]: /rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAND[opt5]) (real interface: pppoe0).
whereas on the other host:
May 23 21:23:41 FW2 php-fpm[854]: /rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAND[opt5]) (real interface: pppoe0).
so neither find_interface_ip nor get_interface_ip is returning an address, but can't figure out why not..
If i hard code it to return my static address whenever the interface is pppoe0, it comes online and everything else seems to work, but obviously this is quite a nasty hack.
-
@stephenw10 so i seem to have tracked it down...
The ISP gives me a small (/28 x.x.x.64/28) block of legacy IP, and the WAN address assigned to the PPPOE session (x.x.x.65) is the same address as the first usable one in the routable /28 block.I have the same address assigned to a DMZ interface, which other devices on this VLAN use as their gateway. When this address is active on the DMZ interface, rc.newwanip doesn't detect it on pppoe0. If i disable DMZ, rc.newwanip detects pppoe0 just fine.
With 22.05 this worked fine. It's a somewhat weird setup, but i guess the ISP doesn't want to waste another expensive legacy address by assigning the pppoe0 interface a separate address. Any idea what might have changed post 22.05 that could have triggered this?
The IPv6 setup is a little different, there is no GUA assigned to the WAN interface, but it doesn't need one as it routes via the link-local addresses, and thus doesn't have any conflict.
-
You can't have the same subnet on two interfaces unless they are bridged. If the ISP was handing you a /32 for the WAN it could work as a more specific route. PPPoE is a point to point protocol so it doesn't actually care what the IPs are.
It probably shouldn't have worked in 22.05. -
@stephenw10 The ISP does hand a /32 for the WAN address, but then the only way to make use of the /28 is to assign the same address with the /28 mask to one of the other interfaces. It's worked fine for many years (since 2.1.x) and all ISPs in the country seem to hand out routable legacy addressing this way.
Is there any other way to configure it?Speaking of same addressing, the logic in the UI also prevents using the same IPv6 link-local address on multiple interfaces - but this is explicitly allowed with IPv6, for instance its quite common to assign fe80::1 on each VLAN as the primary gateway.
-
You can add the additional public IPs as VIPs on the WAN and 1:1 NAT them to internal hosts. But that is a significant change. I'm not sure what might have changed in 23.01 that prevents it pulling the more specific route for a /32.
Does this work without the HA part?
Are you able to test 23.05? -
@stephenw10 23.05 behaves the same as 23.01..
Removing HA for the pppoe parent makes no difference...
Removing HA on the DMZ interface works, so it seems having the WAN address matching a CARP VIP has somehow broken between 22.05 and 23.01.
NAT breaks a lot of stuff so i'd want to avoid it.