pfsense 2.4.4 fails all pppoe's after disabling one
-
I just completed the update on an ha cluster.
I;m facing a serious issue on non ha connections, namely pppoe's
Error logged is this
Sep 25 19:55:37 ppp [wan_link0] LCP: Down event
Sep 25 19:55:37 ppp [wan_link0] Link: reconnection attempt 72 in 2 seconds
Sep 25 19:55:37 ppp [opt4_link0] Link: reconnection attempt 66
Sep 25 19:55:37 ppp [opt4_link0] PPPoE: can't connect "[26]:"->"mpd31815-0" and "[24]:"->"left": No such file or directory
Sep 25 19:55:37 ppp [opt4_link0] can't remove hook mpd31815-0 from node "[26]:": No such file or directory
Sep 25 19:55:37 ppp [opt4_link0] Link: DOWN event
Sep 25 19:55:37 ppp [opt4_link0] LCP: Down event
Sep 25 19:55:37 ppp [opt4_link0] Link: reconnection attempt 67 in 1 seconds
Sep 25 19:55:38 ppp [opt4_link0] Link: reconnection attempt 67
Sep 25 19:55:38 ppp [opt4_link0] PPPoE: can't connect "[26]:"->"mpd31815-0" and "[24]:"->"left": No such file or directory
Sep 25 19:55:38 ppp [opt4_link0] can't remove hook mpd31815-0 from node "[26]:": No such file or directoryRebooting the node fixes everything.
Upgrade was uneventful and all seems to work fine.
All carp interfaces and functionality works as expected too.
Apart from carp, there are three pppoes on each node
(provider allowes multiple ppp session with common credentians, and assigns different dynamic ip's too).
If I disable one pppoe interface on secondary node, then I also loose connections on the other two pppoes.
Reenabling the interface doesn't fix things.
Rebooting the node restores everything back to normal
This was not the case with 2.4.3Update
Did the same on primary node.
Same behaviour.
Sep 25 20:06:31 ppp [opt4_link0] LCP: Down event
Sep 25 20:06:31 ppp [opt4_link0] Link: reconnection attempt 27 in 4 seconds
Sep 25 20:06:33 ppp [wan_link0] Link: reconnection attempt 34
Sep 25 20:06:33 ppp [wan_link0] PPPoE: can't connect "[18]:"->"mpd13395-0" and "[16]:"->"left": No such file or directory
Sep 25 20:06:33 ppp [wan_link0] can't remove hook mpd13395-0 from node "[18]:": No such file or directory
Sep 25 20:06:33 ppp [wan_link0] Link: DOWN event
Sep 25 20:06:33 ppp [wan_link0] LCP: Down event
Sep 25 20:06:33 ppp [wan_link0] Link: reconnection attempt 35 in 4 seconds
Sep 25 20:06:35 pppThis is serious and probably unrelated to HA
Any ideas before reverting to 2.4.3 welcome. -
I'm not seeing that happen here. If I disable one, only that one gets disabled. The other(s) keep working.
You'll need to provide more information about your interface configuration, and the mpd config files from
/var/etc/
would be helpful as well, just redact or remove the username/password info.Also use the "code" (
</>
) button when putting log files and other text data in a post, that way it preserves the original formatting. -
Here we go
less mpd_wan.conf startup: # configure the console set console close # configure the web server set web close default: pppoeclient: create bundle static wan set bundle enable ipv6cp set iface name pppoe0 set iface disable on-demand set iface idle 0 set iface enable tcpmssfix set iface up-script /usr/local/sbin/ppp-linkup set iface down-script /usr/local/sbin/ppp-linkdown set ipcp ranges 0.0.0.0/0 0.0.0.0/0 set ipcp enable req-pri-dns set ipcp enable req-sec-dns #log -bund -ccp -chat -iface -ipcp -lcp -link create link static wan_link0 pppoe set link action bundle wan set link disable multilink set link keep-alive 10 60 set link max-redial 0 set link disable chap pap set link accept chap pap eap set link disable incoming set link mtu 1488 set auth authname "oizks7@otenet.gr" set auth password xxxxx set pppoe service "" set pppoe iface vtnet2.105 open mpd_wan.conf (END) </code>
<code> less mpd_opt4.conf startup: # configure the console set console close # configure the web server set web close default: pppoeclient: create bundle static opt4 set bundle enable ipv6cp set iface name pppoe1 set iface disable on-demand set iface idle 0 set iface enable tcpmssfix set iface up-script /usr/local/sbin/ppp-linkup set iface down-script /usr/local/sbin/ppp-linkdown set ipcp ranges 0.0.0.0/0 0.0.0.0/0 set ipcp enable req-pri-dns set ipcp enable req-sec-dns #log -bund -ccp -chat -iface -ipcp -lcp -link create link static opt4_link0 pppoe set link action bundle opt4 set link disable multilink set link keep-alive 10 60 set link max-redial 0 set link disable chap pap set link accept chap pap eap set link disable incoming set link mtu 1488 set auth authname "ouqa6z@otenet.gr" set auth password xxxxxx set pppoe service "" set pppoe iface vtnet2.107 open mpd_opt4.conf (END) </code>
<code> less mpd_opt8.conf startup: # configure the console set console close # configure the web server set web close default: pppoeclient: create bundle static opt8 set bundle enable ipv6cp set iface name pppoe3 set iface disable on-demand set iface idle 10 set iface enable tcpmssfix set iface up-script /usr/local/sbin/ppp-linkup set iface down-script /usr/local/sbin/ppp-linkdown set ipcp ranges 0.0.0.0/0 0.0.0.0/0 set ipcp enable req-pri-dns set ipcp enable req-sec-dns #log -bund -ccp -chat -iface -ipcp -lcp -link create link static opt8_link0 pppoe set link action bundle opt8 set link disable multilink set link keep-alive 10 60 set link max-redial 0 set link disable chap pap set link accept chap pap eap set link disable incoming set link mtu 1488 set auth authname "userA" set auth password XXXXX set pppoe service "ewifeed" set pppoe iface vtnet2.1004 open mpd_opt8.conf (END)
</code>
virtualised env, under centos kvm
two "physical" interfaces, vtnet1 dedicated to lan traffic and vntnet2 dedicated to wan traffic mapped to physical addresses.
Virtio type interfaces are used.
vtnet2 is vlan tagged and trunked to a managed switch where everything is distributed to physical equipment interfaces.
the two nodes (on different kvm hosts) share same vlans with the cpe equipment in bridge mode..
It works splendid.
On the same "physical" interface there are also carp connections which are not affected.
vlan interfaces used for pppoe, are also configured with ip's so as to have access to the cpe device.
Even disabling (or enabling) the non pppoe parent interface leads to the same behaviour, loosing all pppoes with the error mentioned above.If you think virtualisation might be the issue I could try replicating it on a physical box, connected to the same vlans.
Do note that when all pppoe's fail on one box, the same pppoes (with different ip's/sessions work at the other box without issues.
-
@netblues Just enabled same interfaces on physical box with same vlan configuration ```java Sep 25 21:17:00 ppp [opt9_link0] Link: reconnection attempt 8 in 2 seconds Sep 25 21:17:00 ppp [opt9_link0] LCP: Down event Sep 25 21:17:00 ppp [opt9_link0] Link: DOWN event Sep 25 21:17:00 ppp [opt9_link0] can't remove hook mpd23901-0 from node "[419]:": No such file or directory Sep 25 21:17:00 ppp [opt9_link0] PPPoE: can't connect "[419]:"->"mpd23901-0" and "[417]:"->"left": No such file or directory Sep 25 21:17:00 ppp [opt9_link0] Link: reconnection attempt 7 Sep 25 21:16:59 ppp [opt5_link0] Link: reconnection attempt 41 in 3 seconds Sep 25 21:16:59 ppp [opt5_link0] LCP: Down event Sep 25 21:16:59 ppp [opt5_link0] Link: DOWN event Sep 25 21:16:59 ppp [opt5_link0] can't remove hook mpd71715-0 from node "[3f9]:": No such file or directory
rebooted... and
Sep 25 21:21:15 ppp [opt9] 94.70.44.52 -> 80.106.125.100 Sep 25 21:21:15 ppp [opt9] IPCP: LayerUp Sep 25 21:21:15 ppp [opt9] IPCP: state change Ack-Sent --> Opened Sep 25 21:21:15 ppp [opt9] IPADDR 94.70.44.52 Sep 25 21:21:15 ppp [opt9] IPCP: rec'd Configure Ack #3 (Ack-Sent) Sep 25 21:21:15 ppp [opt9] IPADDR 94.70.44.52 Sep 25 21:21:15 ppp [opt9] IPCP: SendConfigReq #3 Sep 25 21:21:15 ppp [opt9] 94.70.44.52 is OK Sep 25 21:21:15 ppp [opt9] IPADDR 94.70.44.52 Sep 25 21:21:15 ppp [opt9] IPCP: rec'd Configure Nak #2 (Ack-Sent) Sep 25 21:21:15 ppp [opt9] IFACE: Rename interface ng1 to pppoe3 Sep 25 21:21:15 ppp [opt9] IFACE: Up event Sep 25 21:21:15 ppp [opt9] f64d:30ff:fe63:cd86 -> 12f3:11ff:fe12:4285 Sep 25 21:21:15 ppp [opt9] IPV6CP: LayerUp Sep 25 21:21:15 ppp [opt9] IPV6CP: state change Ack-Sent --> Opened Sep 25 21:21:15 ppp [opt9] IPV6CP: rec'd Configure Ack #1 (Ack-Sent) Sep 25 21:21:15 ppp [opt9] IPADDR 0.0.0.0 Sep 25 21:21:15 ppp [opt9] IPCP: SendConfigReq #2 Sep 25 21:21:15 ppp [opt9] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid Sep 25 21:21:15 ppp [opt9] IPCP: rec'd Configure Reject #1 (Ack-Sent) Sep 25 21:21:15 ppp [opt9] IPV6CP: state change Req-Sent --> Ack-Sent Sep 25 21:21:15 ppp [opt9] IPV6CP: SendConfigAck #1 Sep 25 21:21:15 ppp [opt9] IPV6CP: rec'd Configure Request #1 (Req-Sent) Sep 25 21:21:15 ppp [opt9] IPCP: state change Req-Sent --> Ack-Sent Sep 25 21:21:15 ppp [opt9] IPADDR 80.106.125.100 Sep 25 21:21:15 ppp [opt9] IPCP: SendConfigAck #1 Sep 25 21:21:15 ppp [opt9] 80.106.125.100 is OK Sep 25 21:21:15 ppp [opt9] IPADDR 80.106.125.100 Sep 25 21:21:15 ppp [opt9] IPCP: rec'd Configure Request #1 (Req-Sent) Sep 25 21:21:15 ppp [opt9] IPV6CP: SendConfigReq #1 Sep 25 21:21:15 ppp [opt9] IPV6CP: state change Starting --> Req-Sent Sep 25 21:21:15 ppp [opt9] IPV6CP: Up event Sep 25 21:21:15 ppp [opt9] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid Sep 25 21:21:15 ppp [opt9] IPADDR 0.0.0.0 Sep 25 21:21:15 ppp [opt9] IPCP: SendConfigReq #1 Sep 25 21:21:15 ppp [opt9] IPCP: state change Starting --> Req-Sent Sep 25 21:21:15 ppp [opt9] IPCP: Up event Sep 25 21:21:15 ppp [opt9] IPV6CP: LayerStart Sep 25 21:21:15 ppp [opt9] IPV6CP: state change Initial --> Starting Sep 25 21:21:15 ppp [opt9] IPV6CP: Open event Sep 25 21:21:15 ppp [opt9] IPCP: LayerStart Sep 25 21:21:15 ppp [opt9] IPCP: state change Initial --> Starting Sep 25 21:21:15 ppp [opt9] IPCP: Open event Sep 25 21:21:15 ppp [opt9] Bundle: Status update: up 1 link, total bandwidth 64000 bps Sep 25 21:21:15 ppp [opt9_link0] Link: Join bundle "opt9" Sep 25 21:21:15 ppp [opt9_link0] Link: Matched action 'bundle "opt9" ""' Sep 25 21:21:15 ppp [opt9_link0] LCP: authorization successful Sep 25 21:21:15 ppp [opt9_link0] PAP: rec'd ACK #1 len: 5 Sep 25 21:21:15 ppp [opt9_link0] LCP: LayerUp Sep 25 21:21:15 ppp [opt9_link0] PAP: sending REQUEST #1 len: 29 ``````java code
-
Akismet is giving me pains...
So, it can be replicated on physical box ( an intel nuc).
-
Can you attempt to replicate it without using VLANs? Or is that a hard upstream requirement?
I'm curious because PPPoE over VLANs has a bit more to it than PPPoE on its own. I don't have a test setup with multiple PPPoE interfaces that uses VLANs, just regular interfaces.
-
Since this is in a production environment I'm doing things remote as there are no operations at the moment.
Obviously I cannot move things physically, and the nuc box has only one physical interface, so I can't just plug a cable.
Since virtualisation is out of scope, then it has to be the vlans.
I can presume that I'm not the only one with more than one pppoe interface, and since its easy to reproduce, it would have surfaced during the beta phase too.
What else could I look at? -
Nothing else at the moment. You are certainly not the only person with multiple PPPoE interfaces, but having those along with VLANs is less common, but not exactly rare. Though the need to fiddle with them and disable them is probably not something most people do on a regular basis.
If they all work fine until you disable one, just don't disable them and they'll be OK. Not ideal, but at least until the cause can be isolated they still function.
-
@jimp
I just tested rebooting one of the cpe devices. Thankfully pppoes came back (on all three pfsense instances) with no further issues.
However my situation is a bit worse since one of the pppoes doesn't support multiple logins and dial on demand is not really an option in a multiwan environment.
So when it ends being active on the wrong instance I need to restart it, thus the issue at hand. Obviously rebooting isn't really an option.
I have scheduled to change this to a wan carp setup this week, so I will be needing this very soon.
I'm reverting to 2.4.3 and will keep the physical box at 2.4.4 for test purposes.
Is there a ticket opened for this so I can monitor the issue?Kind regards
-
We can't realistically open a ticket for this until we know more about it, at least a way to replicate it in a lab.
Though your use case sounds very unusual, I wouldn't expect a PPPoE session to possibly be active on an incorrect instance unless something else was wrong with your setup.
-
Replicating is straight forward. Just setup a pppoe session via vlan. Supposedly is all that is needed. (I'll try to test this soon too).
As for the active pppoe session, is irrelevant to the situation we are facing here.
Having said that, nothing strange too.
As per HA recommendations, ppp interfaces should be configured with dial on demand, timeout and no monitoring on secondary node.
This works fine, until multiwan comes into play. Without gateway monitoring pf can't say if the link works which creates issues, especially when the pppoe belongs to more than one gw groups and is used for redundancy and traffic policy.
If the secondary unit becomes master there is no easy way to bring up dial on demand, because it is a lower prio interface. When it does happen, idle timeout doesn't happen easily
It would be possible if there was an option to tie a pppoe interface to the carp status of the lan interface.
I understand that it can be scrippted but then maintaining such scripts across releases is not exactly straight forward on the long run. -
I believe I’ve got the same issue.
I’m running pfSense on a multi-nic Intel atom box. It’s got Realtek interfaces (“re” drivers). I’ve also got another box running an i3 CPU with intel interfaces (“igb” drovers).
My WAN connection’s are PPPoE. The PPPoE interface is set up on a VLAN on the NIC.
The other day, I noticed my LAN interface didn’t have an IPv6 address (I’ve got IPv6-pd through my isp with track interface). So I just went to interfaces > LAN, and clicked “Save” then “Apply” without changing any settings. When I did that the whole router stopped passing traffic. I couldn’t access it from the WAN end, nor could I from the LAN end. I gracefully powered it down by pressing the power button which would’ve caused an ACPI shutdown, then powered it back up. When it came back up I checked the logs.
This message repeats many times
Nov 18 13:46:15 ppp [wan_link0] Link: reconnection attempt 14916 in 2 seconds Nov 18 13:46:15 ppp [wan_link0] LCP: Down event Nov 18 13:46:15 ppp [wan_link0] Link: DOWN event Nov 18 13:46:15 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:15 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:15 ppp [wan_link0] Link: reconnection attempt 14915 Nov 18 13:46:13 ppp [wan_link0] Link: reconnection attempt 14915 in 2 seconds Nov 18 13:46:13 ppp [wan_link0] LCP: Down event Nov 18 13:46:13 ppp [wan_link0] Link: DOWN event Nov 18 13:46:13 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:13 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:13 ppp [wan_link0] Link: reconnection attempt 14914 Nov 18 13:46:10 ppp [wan_link0] Link: reconnection attempt 14914 in 3 seconds Nov 18 13:46:10 ppp [wan_link0] LCP: Down event Nov 18 13:46:10 ppp [wan_link0] Link: DOWN event Nov 18 13:46:10 ppp [wan_link0] can't remove hook mpd8627-0 from node "[12]:": No such file or directory Nov 18 13:46:10 ppp [wan_link0] PPPoE: can't connect "[12]:"->"mpd8627-0" and "[10]:"->"left": No such file or directory Nov 18 13:46:10 ppp [wan_link0] Link: reconnection attempt 14913 Nov 18 13:46:09 ppp [wan_link0] Link: reconnection attempt 14913 in 1 seconds Nov 18 13:46:09 ppp [wan_link0] LCP: Down event Nov 18 13:46:09 ppp [wan_link0] Link: DOWN event
Is this a bug?
-
@breakaway Sure is, however netgate doesn't want to accept it, Since they cannot replicate it on a lab. Any luck on replicating this?
-
I have replicated it twice. Once on a device with Realtek nics (re0, re1 etc) and once on an Intel NIC device (igb0,igb1 etc).
All you have to do is go to a LAN interface, click "Save" an "Apply". Internet goes away almost immediately. See screenshots below.
I think it is a requirement to have PPPoE interface ON TOP of a VLAN interface to make this happen. This issue does not exist if your WAN is Static IP.
If I go Status > Interfaces and hit "connect" - that seems to bring it back. But this could be a real issue if I made a change on my LAN interface while I wasn't at home.
-
Got same trouble with pppoe session not recovering after failing on applying changes.
My instance:
PfSense 2.4.4-RELEASE (amd64) built on Thu Sep 20 09:03:12 EDT 2018 FreeBSD 11.2-RELEASE-p3
KVM VM with pass-through cards IG0
VLAN6: IG0.6
WAN: PPPOE1(igb0.6) Movistar provider with user and pass, service name nullOther settings:
Hardware Checksum Offloading Disabled
Hardware TCP Segmentation Offloading Disabled
Hardware Large Receive Offloading Disabled
DNS 1.1.1.1 assigned to WAN GW on DNS Server SettingsWhen I apply changes on assignments or interface configuration it just go down.
If I'm on local management of the firewall, the process takes long to complete, in other menus is fast.Hope it helps to replicate.
-
Ran into the same issue when also updating track interface on LAN, just as @breakaway described. I've been bitten by it once before when setting up a new Ipsec VTI. It did not reconnect and was stuck, had to reboot the router to get things working smoothly again.
-
hi everybody, i have the same problem on my pfsense 2.4.4 vm on top of esx. I have 5 pppoe wans on a vlan interface (one left always disabled): yesterday after disabling one of them (for stuck modem) the others three went down and i lose remote access. Neither bringing them up manually (when i came home) won't stay up. Rebooting bring back all in a operative state.
The ppp logs are very similar to OP. I never had in the past this issue but i don't know if is a problem of v. 2.4.4 or the combination between ppp and vlans. With the purpose of adding a new ppp wan connection without reboot the firewall, i migrate to vlans: before i had a virtual interface for every single ppp/wan interface. -
Hmm, I am seeing this now although I did not initially after upgrading. My PPPoE interfaces usually stay up for months.
What exactly did you do to trigger this?
Whilst testing something else I re-saved a WAN connection and it took down both PPPoE connections with the same errors logged:
Nov 21 16:46:14 ppp [opt4_link0] Link: reconnection attempt 17 in 2 seconds Nov 21 16:46:14 ppp [opt4_link0] LCP: Down event Nov 21 16:46:14 ppp [opt4_link0] Link: DOWN event Nov 21 16:46:14 ppp [opt4_link0] can't remove hook mpd98452-0 from node "[26]:": No such file or directory Nov 21 16:46:14 ppp [opt4_link0] PPPoE: can't connect "[26]:"->"mpd98452-0" and "[17e]:"->"left": No such file or directory Nov 21 16:46:14 ppp [opt4_link0] Link: reconnection attempt 16 Nov 21 16:46:13 ppp [wan_link0] Link: reconnection attempt 25 in 1 seconds Nov 21 16:46:13 ppp [wan_link0] LCP: Down event Nov 21 16:46:13 ppp [wan_link0] Link: DOWN event Nov 21 16:46:13 ppp [wan_link0] can't remove hook mpd17126-0 from node "[1b]:": No such file or directory Nov 21 16:46:13 ppp [wan_link0] PPPoE: can't connect "[1b]:"->"mpd17126-0" and "[4e]:"->"left": No such file or directory Nov 21 16:46:13 ppp [wan_link0] Link: reconnection attempt 24
However I was able to bring them back up by simply clicking connect in Status > Interfaces.
I am also running over VLANs.Steve
-
@stephenw10 said in pfsense 2.4.4 fails all pppoe's after disabling one:
Hmm, I am seeing this now although I did not initially after upgrading. My PPPoE interfaces usually stay up for months.
What exactly did you do to trigger this?
I just click “save” on my LAN interface, then click apply. Don’t even have to change any settings.
Yes, going to interfaces > connect fixes it, but what if you need to make a change on a remote pfsense.
-
I can confirm that this is exactly the case. Hitting reconnect brings it back.
However, on a system with multiple ppp connections, vpn over them etc, a simple change somewhere brings down many things. And obviously, if you are remote, over ppp, things get very very serious.
And bye bye to ppp interfaces staying up for months too :(
It doesn't matter the nic make, if it is physical or virtual, it happens over all.
What hits me is how few people are using ppp over vlans.
Can we expect a ticket? A quick fix?