pfsense 2.4.4 fails all pppoe's after disabling one
-
Akismet is giving me pains...
So, it can be replicated on physical box ( an intel nuc).
-
Can you attempt to replicate it without using VLANs? Or is that a hard upstream requirement?
I'm curious because PPPoE over VLANs has a bit more to it than PPPoE on its own. I don't have a test setup with multiple PPPoE interfaces that uses VLANs, just regular interfaces.
-
Since this is in a production environment I'm doing things remote as there are no operations at the moment.
Obviously I cannot move things physically, and the nuc box has only one physical interface, so I can't just plug a cable.
Since virtualisation is out of scope, then it has to be the vlans.
I can presume that I'm not the only one with more than one pppoe interface, and since its easy to reproduce, it would have surfaced during the beta phase too.
What else could I look at? -
Nothing else at the moment. You are certainly not the only person with multiple PPPoE interfaces, but having those along with VLANs is less common, but not exactly rare. Though the need to fiddle with them and disable them is probably not something most people do on a regular basis.
If they all work fine until you disable one, just don't disable them and they'll be OK. Not ideal, but at least until the cause can be isolated they still function.
-
@jimp
I just tested rebooting one of the cpe devices. Thankfully pppoes came back (on all three pfsense instances) with no further issues.
However my situation is a bit worse since one of the pppoes doesn't support multiple logins and dial on demand is not really an option in a multiwan environment.
So when it ends being active on the wrong instance I need to restart it, thus the issue at hand. Obviously rebooting isn't really an option.
I have scheduled to change this to a wan carp setup this week, so I will be needing this very soon.
I'm reverting to 2.4.3 and will keep the physical box at 2.4.4 for test purposes.
Is there a ticket opened for this so I can monitor the issue?Kind regards
-
We can't realistically open a ticket for this until we know more about it, at least a way to replicate it in a lab.
Though your use case sounds very unusual, I wouldn't expect a PPPoE session to possibly be active on an incorrect instance unless something else was wrong with your setup.
-
Replicating is straight forward. Just setup a pppoe session via vlan. Supposedly is all that is needed. (I'll try to test this soon too).
As for the active pppoe session, is irrelevant to the situation we are facing here.
Having said that, nothing strange too.
As per HA recommendations, ppp interfaces should be configured with dial on demand, timeout and no monitoring on secondary node.
This works fine, until multiwan comes into play. Without gateway monitoring pf can't say if the link works which creates issues, especially when the pppoe belongs to more than one gw groups and is used for redundancy and traffic policy.
If the secondary unit becomes master there is no easy way to bring up dial on demand, because it is a lower prio interface. When it does happen, idle timeout doesn't happen easily
It would be possible if there was an option to tie a pppoe interface to the carp status of the lan interface.
I understand that it can be scrippted but then maintaining such scripts across releases is not exactly straight forward on the long run. -
I believe I’ve got the same issue.
I’m running pfSense on a multi-nic Intel atom box. It’s got Realtek interfaces (“re” drivers). I’ve also got another box running an i3 CPU with intel interfaces (“igb” drovers).
My WAN connection’s are PPPoE. The PPPoE interface is set up on a VLAN on the NIC.
The other day, I noticed my LAN interface didn’t have an IPv6 address (I’ve got IPv6-pd through my isp with track interface). So I just went to interfaces > LAN, and clicked “Save” then “Apply” without changing any settings. When I did that the whole router stopped passing traffic. I couldn’t access it from the WAN end, nor could I from the LAN end. I gracefully powered it down by pressing the power button which would’ve caused an ACPI shutdown, then powered it back up. When it came back up I checked the logs.
This message repeats many times
Nov 18 13:46:15 ppp [wan_link0] Link: reconnection attempt 14916 in 2 seconds Nov 18 13:46:15 ppp [wan_link0] LCP: Down event Nov 18 13:46:15 ppp [wan_link0] Link: DOWN event Nov 18 13:46:15 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:15 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:15 ppp [wan_link0] Link: reconnection attempt 14915 Nov 18 13:46:13 ppp [wan_link0] Link: reconnection attempt 14915 in 2 seconds Nov 18 13:46:13 ppp [wan_link0] LCP: Down event Nov 18 13:46:13 ppp [wan_link0] Link: DOWN event Nov 18 13:46:13 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:13 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:13 ppp [wan_link0] Link: reconnection attempt 14914 Nov 18 13:46:10 ppp [wan_link0] Link: reconnection attempt 14914 in 3 seconds Nov 18 13:46:10 ppp [wan_link0] LCP: Down event Nov 18 13:46:10 ppp [wan_link0] Link: DOWN event Nov 18 13:46:10 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:10 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:10 ppp [wan_link0] Link: reconnection attempt 14913 Nov 18 13:46:09 ppp [wan_link0] Link: reconnection attempt 14913 in 1 seconds Nov 18 13:46:09 ppp [wan_link0] LCP: Down event Nov 18 13:46:09 ppp [wan_link0] Link: DOWN event
Is this a bug?
-
@breakaway Sure is, however netgate doesn't want to accept it, Since they cannot replicate it on a lab. Any luck on replicating this?
-
I have replicated it twice. Once on a device with Realtek nics (re0, re1 etc) and once on an Intel NIC device (igb0,igb1 etc).
All you have to do is go to a LAN interface, click "Save" an "Apply". Internet goes away almost immediately. See screenshots below.
I think it is a requirement to have PPPoE interface ON TOP of a VLAN interface to make this happen. This issue does not exist if your WAN is Static IP.
If I go Status > Interfaces and hit "connect" - that seems to bring it back. But this could be a real issue if I made a change on my LAN interface while I wasn't at home.
-
Got same trouble with pppoe session not recovering after failing on applying changes.
My instance:
PfSense 2.4.4-RELEASE (amd64) built on Thu Sep 20 09:03:12 EDT 2018 FreeBSD 11.2-RELEASE-p3
KVM VM with pass-through cards IG0
VLAN6: IG0.6
WAN: PPPOE1(igb0.6) Movistar provider with user and pass, service name nullOther settings:
Hardware Checksum Offloading Disabled
Hardware TCP Segmentation Offloading Disabled
Hardware Large Receive Offloading Disabled
DNS 1.1.1.1 assigned to WAN GW on DNS Server SettingsWhen I apply changes on assignments or interface configuration it just go down.
If I'm on local management of the firewall, the process takes long to complete, in other menus is fast.Hope it helps to replicate.
-
Ran into the same issue when also updating track interface on LAN, just as @breakaway described. I've been bitten by it once before when setting up a new Ipsec VTI. It did not reconnect and was stuck, had to reboot the router to get things working smoothly again.
-
hi everybody, i have the same problem on my pfsense 2.4.4 vm on top of esx. I have 5 pppoe wans on a vlan interface (one left always disabled): yesterday after disabling one of them (for stuck modem) the others three went down and i lose remote access. Neither bringing them up manually (when i came home) won't stay up. Rebooting bring back all in a operative state.
The ppp logs are very similar to OP. I never had in the past this issue but i don't know if is a problem of v. 2.4.4 or the combination between ppp and vlans. With the purpose of adding a new ppp wan connection without reboot the firewall, i migrate to vlans: before i had a virtual interface for every single ppp/wan interface. -
Hmm, I am seeing this now although I did not initially after upgrading. My PPPoE interfaces usually stay up for months.
What exactly did you do to trigger this?
Whilst testing something else I re-saved a WAN connection and it took down both PPPoE connections with the same errors logged:
Nov 21 16:46:14 ppp [opt4_link0] Link: reconnection attempt 17 in 2 seconds Nov 21 16:46:14 ppp [opt4_link0] LCP: Down event Nov 21 16:46:14 ppp [opt4_link0] Link: DOWN event Nov 21 16:46:14 ppp [opt4_link0] can't remove hook mpd98452-0 from node "[26]:": No such file or directory Nov 21 16:46:14 ppp [opt4_link0] PPPoE: can't connect "[26]:"->"mpd98452-0" and "[17e]:"->"left": No such file or directory Nov 21 16:46:14 ppp [opt4_link0] Link: reconnection attempt 16 Nov 21 16:46:13 ppp [wan_link0] Link: reconnection attempt 25 in 1 seconds Nov 21 16:46:13 ppp [wan_link0] LCP: Down event Nov 21 16:46:13 ppp [wan_link0] Link: DOWN event Nov 21 16:46:13 ppp [wan_link0] can't remove hook mpd17126-0 from node "[1b]:": No such file or directory Nov 21 16:46:13 ppp [wan_link0] PPPoE: can't connect "[1b]:"->"mpd17126-0" and "[4e]:"->"left": No such file or directory Nov 21 16:46:13 ppp [wan_link0] Link: reconnection attempt 24
However I was able to bring them back up by simply clicking connect in Status > Interfaces.
I am also running over VLANs.Steve
-
@stephenw10 said in pfsense 2.4.4 fails all pppoe's after disabling one:
Hmm, I am seeing this now although I did not initially after upgrading. My PPPoE interfaces usually stay up for months.
What exactly did you do to trigger this?
I just click “save” on my LAN interface, then click apply. Don’t even have to change any settings.
Yes, going to interfaces > connect fixes it, but what if you need to make a change on a remote pfsense.
-
I can confirm that this is exactly the case. Hitting reconnect brings it back.
However, on a system with multiple ppp connections, vpn over them etc, a simple change somewhere brings down many things. And obviously, if you are remote, over ppp, things get very very serious.
And bye bye to ppp interfaces staying up for months too :(
It doesn't matter the nic make, if it is physical or virtual, it happens over all.
What hits me is how few people are using ppp over vlans.
Can we expect a ticket? A quick fix? -
Another trouble I found is one ppoe connection type won't even reconnect if I just reboot or press connect button on interfaces menu.
I have to rebuild the connection to get connectivity.
Remove the pppoe config, remove the vlan interface config and create again the vlan for the interface, and the pppoe settings.
In my case is Vodafone Spain, vlan 100 and ppoe session over the vlan100.I think there are some issues with the vlans and pppoe, they have to investigate that on this release 2.4.4 this wasn't affecting previous versions.
-
I opened a bug to track this. https://redmine.pfsense.org/issues/9148
-
When you see this do you also see logged:
Nov 21 16:45:21 pfsense kernel: vlan1: changing name to 'lagg0.102'
Or similar?
I actually see the connection come back up successfully in the system log and then fail. I think think this is actually a symptom of something else.
-
@stephenw10 indeed.. this is in system log
Nov 22 05:36:26 lb kernel: vlan2: changinn name to 're0.103' v 22 05:36:32 lb php-fpm[347]: /interfaces.php: MONITOR: VDSL_OTE1_PPP_PPPOE is down, omitting from routing group Default_Gateway_Group_ipv4 80.106.125.100|2.86.52.251|VDS L_OTE1_PPP_PPPOE|14.683ms|0.578ms|90%|down
and then keeps spitting errors in ppp log like above