No internet connection after upgrade from April 20 snapshot to May 19
-
So I did further analysis. I have found that if you have PPPoE configured on old version as WAN, on pre May version, then it takes PPPoE gateway as default gateway, but when you save config.xml there are ZERO entry in <gateways> section, if I change something in my WAN_PPPoE gateway on old April version and save and then change back and save again — then default gateway appears in config.xml and after upgrade to May version I have now internet connection from boot and even after reboot, no need to "Use non-local gateway" option, but if I use new installation and use wizard for creating PPPoE on WAN then I need manually set default route to received via PPPoE ISP gateway appeared in list and then, after next reboot again no Internet copnnection and I need to set "Use non-local gateway" option to make it working constantly.
There are also some images added . The one showing some strange IPv6 information, after manually setting default IPv4 gateway.
This one showsdefault routesconfigured gateways after configuring PPPoE from zero on new system.And at the end there is a comparision of two config files, the one on the left is that is created after manipulating on old version to make appear default gateway and after successful upgrade to May version and on the right pane is that pure May config file AFTER selecting default gateway manually and "Use non-local gateway" option. Both configurations make Internet working. Looks like there is some bug in parsing and writing this section, also there is some bug that prevents setting PPPoE or any dynamic gateway as default on new system.
-
I am pretty sure that changes made in this commit https://github.com/pfsense/pfsense/commit/43a9b03deb9db482713dfa1218662bce8b6360ce#diff-1332c372788c9e1a8c6c9bae9ebb55a5
are the root of the problem with default dynamic gateway on PPPoE, in my case. I think it's all around system.inc within deleted lines 647-705, because restoring them fixed the problem with default gateway.
Finally, steps to reproduce the problem. Create your own pfSense firewall, virtually or on real hardware, you will need two Ethernet cards. I have used Auto install on ZFS and after successful install I just used wizard, that automatically started after logging into WEBGUI and I have nothing changed, only selected WAN as PPPoE for IPv4 and that all, after finishing wizard steps you will already have configuration with NO default ROUTE and therefore NO internet connection. -
https://redmine.pfsense.org/issues/8504#change-36668
I thought that it was the similar problem, but also the wrong suspect.
-
@w0w said in No internet connection after upgrade from April 20 snapshot to May 19:
https://redmine.pfsense.org/issues/8504#change-36668
That is a completely different gateway issue, unrelated to this one. That was a failed gateway upgrade problem, this is a separate issue having to do with purely dynamic gateways not present in config.xml. If there isn't already a ticket for it, it should have one.
-
@jimp, in my case if I follow this closed ticket guidelines, installing the 2.4.3, everything works like a charm, but if I upgrade this working installation to 2.4.4 latest version it just does not have default gateway anymore, but it does not mean your upgrade code fails, yes, and it is different issue, caused by different commit mentioned above. Also, I understand that you have more information about closed issue and you are right in your conclusions, but from my side it looked like similar issue and my conclusions were based just on guideline posted on redmine ticket.
If I understand correctly this removed code that maintained dynamic gateways, making them default without placing them into config.xml, caused some other problems? -
Completely dynamic gateways do not need an entry in config.xml, they never have. They are dynamic and maintained internally. There is nothing in config.xml so there is nothing for the upgrade code to touch.
Again, it's apples and oranges. The symptoms are similar but they are completely different problems.
-
@jimp, yes it's absolutely different issue, I see it.
If dynamic gateways don't need entry then why the code maintained them was removed, i mean those system.inc changes?
I am also not sure about redmine tickets regarding this problem, I don't see anything now, but it also can be just because I'm blind -
I don't know, it happened as a part of a larger feature merge, that's what still needs to be investigated. It's possible that it was not intentional, or it wasn't compatible with the new features, etc. That is what remains to be debugged and tested for this specific issue.
-
Looking through the code the same actions seem to be taken now but they are split into separate functions. It's still possible there is some difference but I haven't spotted it yet.
I've also tried replicating the problem on several VMs here but I haven't been able to make one end up without a default gateway in the OS routing table. I have a few purely dynamic gateway VMs but they are all DHCP. Since I don't currently have a way to test one that is purely PPPoE, it may be that the issue is specific to only having PPPoE dynamic gateways.
The only thing I have noticed is that despite actually having a default gateway at the OS, these show
None
in the Default Gateway IPv4 drop-down and nothing in the Default column of the gateway list. There is likely a need to improve the handling of that field. Though I'm still not sure we should even offer aNone
option for IPv4 at this point. The previous code forced one default at all times.I need to check into a couple unrelated things now but I'll try to setup a PPPoE only VM if I can and see what happens.
@PiBa may need to have a look as well, he developed the default gateway changes that were merged into 2.4.4.
-
@jimp
Thank you! I hope you will find something. -
Tried a few things but have not yet found a scenario when the default-route gets 'missing'..
Currently testing with 2 pfSense boxes 1 being the pppoe server, the other being the client.. I can change the settings on the server, and the client would loose connection for a few seconds, and then pick it up again, including new gateway-ip and different client-ip..
The gateway pppoe interface is selected as the default on my config in the webgui. Even though it is a 'automatically generated' gateway it does show in the selection boxes, and should be selected to be the default.
The 'non-local gateway' option is should not be related to pppoe connections. It was first created for OVH datacenter usage where they have 'regular' network connections, but do need to route traffic out over the specified interface..
Either way, the gateway selection box being empty would be strange..
Ive also installed a new pfSense and went through the 'wizard' in which case the default-gateway indeed stays as 'none', but other options are shown available in it when clicked, i suppose the wizard needs to be changed to fill the new selection option.. After selecting that WAN_PPPOE manually though the default route gets added properly.
Only thing i could find sofar is that the 'installation wizard' needs a little fix.. When configured, the actual code handling those settings 'seems' to work properly in my tests..
-
@piba said in No internet connection after upgrade from April 20 snapshot to May 19:
The ‘non-local gateway’ option
I am also not sure why it's needed in my case, but I think it triggers something else and connection restores. I do think that this can be some other problem, but also related to those changes, because restoring of system.inc fixes everything. I think that we have some scenario like when you are selecting manually this gateway it is then stored in config.xml and then it processed some other way needed this option also, but I am not sure about that. May be the the second one, like adding manually WAN_PPPoE creates one entry in config.xml and checking this ‘non-local gateway’ option just saves my dynamic gateway into config.xml and that's why it works.
I can do some tests tomorrow, I hope.
The one thing that also confuses me it that if I select that WAN_PPPOE manually, it also works without this ‘non-local gateway’ option but only up to re-connection or reboot.@piba said in No internet connection after upgrade from April 20 snapshot to May 19:
Only thing i could find sofar is that the ‘installation wizard’ needs a little fix…
But this little fix does not solve the problem when you have no default gateway on upgrade, since you have already configured everything and config.xml does not contain dynamic gateway entry.
-
Quick test showed that second scenario posted above is true.
No need for this option at all, just pressing 'save' in gateway options do the same thing. -
Okay seems i can reproduce it now with a reboot while WAN nic is disconnected, then after enabling the wan-nic, it does get a IP, but the default-route isn't set. Then when the PPPOE server side is restarted, the PPPOE client does automatically re-connected and then does set its default-route that time..
Trying to figure out now how where/how the first connection doesn't set the gateway and the second does. Seems to me like both should follow almost the same steps when the connection is established.
-
FYI
what im seeing is this in my systemlog, both times the rc.newwanip is running, but only the second time it tells "Default gateway setting Interface WAN_PPPOE Gateway as default
".
I presume you have something similar to the first one in your log.?Jun 9 15:51:30 php-fpm 321 /rc.start_packages: Restarting/Starting all packages. Jun 9 15:51:28 check_reload_status Starting packages Jun 9 15:51:28 php-fpm 322 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.30.20.163 -> 10.30.20.163 - Restarting packages. Jun 9 15:51:26 php-fpm 322 /rc.newwanip: Creating rrd update script Jun 9 15:51:26 php-fpm 322 /rc.newwanip: Resyncing OpenVPN instances for interface WAN. Jun 9 15:51:23 php-fpm 322 /rc.newwanip: Default gateway setting Interface WAN_PPPOE Gateway as default. Jun 9 15:51:23 php-fpm 322 /rc.newwanip: rc.newwanip: on (IP address: 10.30.20.163) (interface: WAN[wan]) (real interface: pppoe0). Jun 9 15:51:23 php-fpm 322 /rc.newwanip: rc.newwanip: Info: starting on pppoe0. Jun 9 15:51:22 ppp [wan] IFACE: Rename interface ng0 to pppoe0
Jun 9 15:47:55 php-fpm 73857 /rc.start_packages: Restarting/Starting all packages. Jun 9 15:47:54 check_reload_status Starting packages Jun 9 15:47:54 php-fpm 321 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.30.20.163 -> 10.30.20.163 - Restarting packages. Jun 9 15:47:52 php-fpm 321 /rc.newwanip: Creating rrd update script Jun 9 15:47:52 php-fpm 321 /rc.newwanip: Resyncing OpenVPN instances for interface WAN. Jun 9 15:47:48 php-fpm 321 /rc.newwanip: rc.newwanip: on (IP address: 10.30.20.163) (interface: WAN[wan]) (real interface: pppoe0). Jun 9 15:47:48 php-fpm 321 /rc.newwanip: rc.newwanip: Info: starting on pppoe0. Jun 9 15:47:47 ppp [wan] IFACE: Rename interface ng0 to pppoe0
-
@piba
Yep, looks exactly like that. Both. I've just found that on the latest snapshot it works a little bit different way or I just missed it before:
After setting my dynamic gateway manually as default I see that line/rc.newwanip: Default gateway setting Interface WAN_PPPOE Gateway as default.
then I rebooted firewall and looked at system log again — when PPPoE starts on boot, I don't see this line anymore, system log is exactly like your second one and default route is missing, certainly. If I manually re-connect PPPoE without changing anything else, then I see this line again and everything works like it should be.
-
It seems the dynamic pppoe gateway does not have a status yet when it hasn't connected before.. And the code assumes its a gatewaygroup as it cannot find the status a normal gateway normally does have..
Would you be able to try this little patch with 1 changed line ?:src/etc/inc/gwlb.inc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/etc/inc/gwlb.inc b/src/etc/inc/gwlb.inc index 9a059a7..e0b94be 100644 --- a/src/etc/inc/gwlb.inc +++ b/src/etc/inc/gwlb.inc @@ -1032,7 +1032,7 @@ function fixup_default_gateway($ipprotocol, $gateways_status, $gateways_arr) { } else { $gwdefault = $config['gateways']['defaultgw6']; } - if (isset($gateways_status[$gwdefault])) { + if (isset($gateways_arr[$gwdefault])) { // the configured gateway is a regular one. (not a gwgroup) use it as is.. $dfltgwname = $gwdefault; } else {
-
Redmine ticked and fix submitted for review: https://redmine.pfsense.org/issues/8561
-
@piba
I have patched manually this line, but problem still exists.
I need to select gateway manually and it works until reboot, after reboot I need 'Disconnect'/'Connect' PPPoE to make it work.
Also I am confused with that:
-
That it says 'Default (IPv6)' seems just a 'display issue'. It is determined by a function that doesn't take dynamic gateways into account currently. Anyhow added a fix to the PR for that https://github.com/pfsense/pfsense/pull/3947/commits/092abdb6005072365bc860966b0e2ffce8d85e1b
Can you add this logging below and run another test? Please let me know what it tells. as my 'reproduction' is nolonger failing with the patch above and i'm currently out of ideas where to look, sorry but this is probably going to take a few rounds of trial and error.:
src/etc/inc/gwlb.inc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/etc/inc/gwlb.inc b/src/etc/inc/gwlb.inc index 7b157b5..3a70fe3 100644 --- a/src/etc/inc/gwlb.inc +++ b/src/etc/inc/gwlb.inc @@ -1105,6 +1105,8 @@ function fixup_default_gateway($ipprotocol, $gateways_status, $gateways_arr) { } } } + log_error("fixup_default_gateway dfltgwdown:{$dfltgwdown} upgw:{$upgw} dfltgwname:{$dfltgwname} , gwdefault:{$gwdefault}"); + log_error("gateways_arr:".print_r($gateways_arr,true)); if ($dfltgwdown == true && !empty($upgw)) { setdefaultgateway($gateways_arr[$upgw]); } else if (!empty($dfltgwname)) {