PPPoE problems after upgrading from 2.4.2 to 2.4.4



  • Hi,

    I took the plunge today and upgraded from pfSense 2.4.2 to the 2.4.4 release.
    Upgrade process itself went fine but after the reboot my pfSense had no internet connection.

    pfSense box is directly connected with an ethernet cable to the fiber termination point of my provider.
    NIC in the pfSense is an Intel Pro quadport (igb0) and i'm using PPPoE on a vlan from my provider.
    It seems that the ppp-daemon for whatever reason can't reach the upstream provider.

    clog /var/log/ppp.log
    Sep 29 15:30:17 p ppp: process 43663 started, version 5.8 (nobody@pfSense_v2_4_4_amd64-pfSense_v2_4_4-job-04 23:36  4-Sep-2018)
    Sep 29 15:30:17 p ppp: web: web is not running
    Sep 29 15:30:17 p ppp: [wan] Bundle: Interface ng0 created
    Sep 29 15:30:17 p ppp: [wan_link0] Link: OPEN event
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: Open event
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: state change Initial --> Starting
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: LayerStart
    Sep 29 15:30:17 p ppp: [wan_link0] PPPoE: Set PPP-Max-Payload to '1500'
    Sep 29 15:30:17 p ppp: [wan_link0] PPPoE: Connecting to ''
    Sep 29 15:30:26 p ppp: [wan_link0] PPPoE connection timeout after 9 seconds
    Sep 29 15:30:26 p ppp: [wan_link0] Link: DOWN event
    Sep 29 15:30:26 p ppp: [wan_link0] LCP: Down event
    

    Only to repeat the pppoe negotiationg again and again.
    Looking at the logs from before the upgrade it should have looked something like this:

    Sep 27 18:57:08 p ppp: process 5305 started, version 5.8 (nobody@pfSense_v2_4_2_amd64-pfSense_v2_4_2-job-14 18:47 16-Nov-2017)
    Sep 27 18:57:08 p ppp: web: web is not running
    Sep 27 18:57:08 p ppp: [wan] Bundle: Interface ng0 created
    Sep 27 18:57:08 p ppp: [wan_link0] Link: OPEN event
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: Open event
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: state change Initial --> Starting
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: LayerStart
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: Set PPP-Max-Payload to '1500'
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: Connecting to ''
    Sep 27 18:57:08 p ppp: PPPoE: rec'd ACNAME "x.x.x.x"
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: rec'd PPP-Max-Payload '1500'
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: connection successful
    

    But i never reach that stage.
    Rolling back to the 2.4.2 release fixes my internet connection and re-upgrading breaks the it again.
    I also tried removing the complete config from the 2.4.4 installation and redo all steps for the pppoe connection but without any luck.

    I found this recent bug on the tracker but it seems to have been resolved, so it should not be hitting it.
    https://redmine.pfsense.org/issues/8603

    Does anyone have a tip on how to continue debugging the issue?

    Thanks!
    Peter


  • Netgate Administrator

    Try running at the command line, or in Diag > Command Prompt:
    ifconfig igb0 promisc

    Assming igb0 is parent of the vlan you are running PPPoE on. I saw one other case where that allowed the connection to complete but it's not yet known why.

    Steve



  • Hi Steve,

    Thanks for the suggestion, just tried it but unfortunately same behavior.
    I'm out today but i can make a packet capture tomorrow if that helps debugging?

    -peter


  • Netgate Administrator

    Yes, I would assign the parent interface and then run a packet capture there. You should be able to see both the PPP session and the VLAN tagging in that pcap to check they are correct.

    Steve



  • @pbosgraaf

    I may have the same issue see my post (https://forum.netgate.com/topic/136174/update-to-2-4-4-failed-in-hyper-v-2012r2).

    I am also running PPPoE and nothing can connect to the internet. I was also getting Altq errors with 2.4.4 which I was not with 2.4.3. I hadnt twigged it could be a PPPoE issue as well.

    I have also noticed in 2.4.3 that each time the router restarts, PPPoE connection goes down about 30 seconds after intially coming up and needs to be manually restarted. I wonder what has changed with these recent releases.

    Ian



  • This seems to be an upgrade issue, not necessarily an issues with 2.4.4. I had the same issue today. Even re-installing pfsense while keeping the config didn't fix it. However, re-installing a clean version of pfsense 2.4.4 and then setting up my PPPoE again worked just fine. I'm now restoring piece after piece from my old config (except the interface configuration).


  • Netgate Administrator

    I'd be very interested if you find something in the old config that is breaking it.

    Can you share the WAN and PPP sections? Omitting logins of course.

    Steve



  • It looks like there are multiple issues in my old config (judging from which restore option break the PPPoE).

    1. issue: My interface setup

    <interfaces>
    		<wan>
    			<if>pppoe0</if>
    			<blockpriv></blockpriv>
    			<blockbogons></blockbogons>
    			<descr><![CDATA[WAN]]></descr>
    			<spoofmac>00:00:00:00:00:00</spoofmac>
    			<enable></enable>
    			<ipaddr>pppoe</ipaddr>
    		</wan>
    		<lan>
    			<enable></enable>
    			<if>igb1</if>
    			<descr><![CDATA[LAN]]></descr>
    			<spoofmac></spoofmac>
    			<ipaddr>10.12.10.1</ipaddr>
    			<subnet>24</subnet>
    		</lan>
    		<opt1>
    			<descr><![CDATA[OPT1]]></descr>
    			<if>igb2</if>
    			<spoofmac></spoofmac>
    			<enable></enable>
    			<ipaddr>10.12.20.1</ipaddr>
    			<subnet>24</subnet>
    		</opt1>
    		<opt2>
    			<descr><![CDATA[VPN-OUT]]></descr>
    			<if>ovpnc2</if>
    			<enable></enable>
    			<spoofmac></spoofmac>
    		</opt2>
    		<opt3>
    			<descr><![CDATA[VPN-IN]]></descr>
    			<if>ovpns1</if>
    			<enable></enable>
    			<ipaddr>10.12.30.1</ipaddr>
    			<subnet>24</subnet>
    			<spoofmac></spoofmac>
    		</opt3>
    	</interfaces>
    
    <ppps>
    		<ppp>
    			<ptpid>0</ptpid>
    			<type>pppoe</type>
    			<if>pppoe0</if>
    			<ports>igb0.7</ports>
    			<username>****</username>
    			<password>****</password>
    			<provider>****</provider>
    			<bandwidth></bandwidth>
    			<mtu></mtu>
    			<mru></mru>
    			<mrru></mrru>
    		</ppp>
    	</ppps>
    	<vlans>
    		<vlan>
    			<if>igb0</if>
    			<tag>7</tag>
    			<pcp></pcp>
    			<descr><![CDATA[VDSL]]></descr>
    			<vlanif>igb0.7</vlanif>
    		</vlan>
    	</vlans>
    

    opt3 seems to cause the issue, since even settings the fresh install up in this way breaks the PPPoE immediately. This worked fine in 2.4.3-p1 and was mainly used to serve DNS to OpenVPN clients.

    2. issue: My packages config (PfBlockerNG 2.1.4_13, OpenVPN Client Export Utility 1.4.17_2, Traffic Totals 1.2.4)
    No idea what's the issue here and the complete configs would be a little much, but importing the package config also seems to break the PPPoE. I haven't gotten around to setting this up in my fresh install so I don't (yet) know whether setting it up without the config will also break PPPoE.

    Imports that worked without any issues: Aliases, DNS Resolver, DHCP Server, DHCPv6 Server, System, Wake-on-LAN


  • Netgate Administrator

    Hmm, one thing that might be causing this is the new gateway handling in 2.4.4.

    Does your PPPoE connection fail to connect or just fail to route traffic?

    Check it is the default gateway in System > Routing > Gateways when it fails. Set it specifically as default if it is not.

    Steve



  • It does not connect at all:

    Oct 3 15:58:43	ppp		[wan_link0] Link: reconnection attempt 1 in 3 seconds
    Oct 3 15:58:46	ppp		[wan_link0] Link: reconnection attempt 1
    Oct 3 15:58:46	ppp		[wan_link0] PPPoE: can't connect "[13]:"->"mpd9828-0" and "[11]:"->"left": No such file or directory
    Oct 3 15:58:46	ppp		[wan_link0] can't remove hook mpd9828-0 from node "[13]:": No such file or directory
    Oct 3 15:58:46	ppp		[wan_link0] Link: DOWN event
    Oct 3 15:58:46	ppp		[wan_link0] LCP: Down event
    

  • Netgate Administrator

    At the command line or in Diag > Command Prompt try running:
    clog /var/log/ppp.log | grep secret

    You might need to do that after rebooting as you ppp log will be exceptionally busy.

    If it reports a missing secret try re-creating the ppp interface to generate the file again. We have seen that with one other user so far.

    Steve



  • Just try rebooting.
    I have the same error which goes away after reboot.
    No issues in 2.4.3
    https://forum.netgate.com/topic/135920/pfsense-2-4-4-fails-all-pppoe-s-after-disabling-one/4



  • @netblues Rebooting does fix it, thanks. However, it doesn't seem ideal to have to reboot anytime you assign a (virtual) interface.



  • try disabling and re enabling, you will probably end up to the same issue. The thing is no one has a clue what is causing this.



  • @netblues It actually looks like just ANY change to ANY interface will cause these pppoe-issues. I just changes my OPT1 static ip address and even that killed my pppoe until I rebooted.


  • Netgate Administrator

    Did you try running that command? I assume it did not log any errors there?

    Try running ifconfig -av when it's working and when it's not. See if there's anything different there.

    It's hard to believe it would change when altering another interface but you could also check the generated file at /var/etc/mpd_wan.conf. Does that change?

    Steve



  • @stephenw10 Thank you very much for your continued efforts. And just to be clear: My PPPoE connection works fine unless I change any setting of any interface - then I have to reboot.

    clog [...] secret did not result in any output.

    Here's what ifconfig -av returns when things are fine:

    0_1539186760802_Screenshot from 2018-10-10 17-51-43.png

    And when PPPoE is broken:

    0_1539186777441_Screenshot from 2018-10-10 17-51-53.png

    The conf file does not change.

    [Sorry for posting screenshots, anti-filter would not let me post the output as text]



  • Please do something with akisment. It doesnt allow output posts.

    Situation is exactly the same over here.


  • Rebel Alliance Developer Netgate

    @netblues said in PPPoE problems after upgrading from 2.4.2 to 2.4.4:

    Please do something with akisment. It doesnt allow output posts.

    Situation is exactly the same over here.

    1. Wrap the output in code tags
      or
    2. Attach long output as a text file, rather than dumping it all in the post

    Both are good forum etiquette and will avoid issues with posts being flagged as spam.



  • Unfortunately its the tagged code text that is not allowed by akismet..
    Heuristic analysis of what is spam and what is not fails miserably.
    Will try attachements next time.

    On topic, I have 3 instances, one is a physical box running on intel nuk hardware and two are virtual under centos kvm.
    All three of them talk to the same bridge modem devices via vlan trunks.
    The two virtual ones have been reverted to 2.4.3p1 and work fine.
    The physical one (on 2.4.4, updated from 2.4.3p1) needs to reboot every time something changes on ppp
    (provider allows three concurrent ppp calls)
    The only purpose of the physical box is to test 2.4.4 issues. Unfortunately, since it only has one physical lan, can't test without vlans.
    On the other hand, if everyone had issues with simple pppoe there would be lots of complain threads by now.
    I suspect it has to do with permssions and/or file ownersip, which is corrected upon reboot. If it was a config issue, rebooting would never fix it.
    Any pointers on what to look for?



  • I'm estonished that there are not more posts about this pppoe problem on 2.4.4. Maybe there are not so many people using pppoe after all ?
    Since upgrading from 2.4.3 to 2.4.4, my pppoe doesn't work anymore neither. I also get these weird errors like you stating "file not found" or "not allowed to allocate ip address". Reboot doesn't fix it though, sadly.

    It's super annoying because I had to revert back to 2.3.3 and I cannot use any package anymore because I get the following error when trying to install one :

    WARNING: Current pkg repository has a new PHP major
             version. pfSense should be upgraded before
             installing any new package.
    Failed
    

    Did someone found out why the new pppoe connection layer is failing for some and not for others ?



  • Are you using vlans for pppoe?



  • @netblues No, it's not needed by the provider.



  • @joxxxx Im not talking about the provider, just as a means of adding more than one interface on a machine that just has one via a managed ethernet switch
    It seems the problem exists when vlans is involved.
    And rebooting or just hitting connect seems to fix things.
    Perhaps you are facing a slight different problem?



  • @netblues Oh ok, I thought you wanted to know if the connection itself had a special vlan (some providers ask for that).
    No, I don't use any vlans. I have 2 wan : 1 pppoe ftth and 1 cable. Only the pppoe fails on 2.4.4.

    It's maybe another problem, but I tried everything I could, nothing helped. It's really strange... The connection works flawlessly when directly wired to the pc, on vyOS, sophos, pf2.4.3 but not on pf2.4.4 or pf2.4.5. Something changed, but what ?



  • @joxxxx Have you tried recreating the ppp connection on pf 2.4.4 from scratch?



  • @netblues yes, tried first by updating, then by installing a fresh 2.4.4 and importing the settings and then on a fresh 2.4.4 and manually creating the pppoe connection. Didn't work with the same errors all the time.

    For info, I'm running it on esxi 6.5 with a quad gigabit eth card.



  • @joxxxx I suppose you have disabled all 3 hardware offload options on advanced/networking.
    (meaningless for a virtual network interface, by design...)

    And what are the exact errors?



  • @netblues Actually, Hardware TCP Segmentation Offloading and Hardware Large Receive Offloading were disabled, but not Hardware Checksum Offloading.

    The errors are :

    [opt1] IFACE: Adding IPv4 address to pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): File exists 
    
    [opt1] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address 
    

    It then cycles infinitely, each time getting a new wan ip and throwing the same errors.

    Thanks a lot for your help !



  • @joxxxx This is something else. Have you tried with hardware checksum offloading disabled?



  • @netblues Not yet, I'll boot up the 2.4.4 vm and test it out. (On 2.3.3 hardware offloading doesn't change anything).



  • Ok, just tried it and still the same, here some more logs (appart of the recurring file exists etc. from above) :

    Nov 22 14:00:50 	ppp 		[opt1] IFACE: Up event
    Nov 22 14:00:50 	ppp 		[opt1] IFACE: Rename interface ng0 to pppoe0
    Nov 22 14:00:50 	ppp 		[opt1] IPCP: rec'd Configure Ack #3 (Ack-Sent)
    Nov 22 14:00:50 	ppp 		[opt1] IPADDR 188.XX.XX.XX
    Nov 22 14:00:50 	ppp 		[opt1] IPCP: state change Ack-Sent --> Opened
    Nov 22 14:00:50 	ppp 		[opt1] IPCP: LayerUp
    Nov 22 14:00:50 	ppp 		[opt1] 188.XX.XX.XX -> 10.0.0.1
    Nov 22 14:00:50 	ppp 		[opt1] IFACE: Adding IPv4 address to pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): File exists 
    

    Strange thing is that it talks about ipv6 although I disabled it everywhere I could. It seems like on 2.4.3 it doesn't do that.

    Even updated to 2.4.5-dev to see, but still the same.



  • Strange as it is it seems it manages to connect, negotiate, auth, get ip and then fail.
    Any ideas why a seemingly public ip is being changed to a private one?(10.0.01) Does this happen on other tests?



  • @netblues 10.0.0.1 is the pf lan address. I don't know why it does that actually. There are 3 phases to the connection process. First it's 0.0.0.0, then 10.0.0.1 and then the real ip address. Or sometimes starts with 10.0.0.1, then 0.0.0.0 and the the real ip.

    10.0.0.1 seems to be taken as the pppoe gateway, maybe that's not "allowed" anymore in 2.4.4 ?
    On the dashboard, the gateway for pppoe says 10.0.0.1 and the interface has the correct ip.

    Here is the log of the 2.3.3 that works :

    Nov 22 14:05:26 	ppp 		[opt1] IFACE: Up event
    Nov 22 14:05:26 	ppp 		[opt1] IFACE: Rename interface ng0 to pppoe0
    Nov 22 14:05:26 	ppp 		[opt1] IPCP: rec'd Configure Nak #2 (Ack-Sent)
    Nov 22 14:05:26 	ppp 		[opt1] IPADDR 188.XX.XX.XX
    Nov 22 14:05:26 	ppp 		[opt1] 188.XX.XX.XX is OK
    Nov 22 14:05:26 	ppp 		[opt1] IPCP: SendConfigReq #3
    Nov 22 14:05:26 	ppp 		[opt1] IPADDR 188.XX.XX.XX
    Nov 22 14:05:26 	ppp 		[opt1] IPCP: rec'd Configure Ack #3 (Ack-Sent)
    Nov 22 14:05:26 	ppp 		[opt1] IPADDR 188.XX.XX.XX
    Nov 22 14:05:26 	ppp 		[opt1] IPCP: state change Ack-Sent --> Opened
    Nov 22 14:05:26 	ppp 		[opt1] IPCP: LayerUp
    Nov 22 14:05:26 	ppp 		[opt1] 188.XX.XX.XX-> 10.0.0.1 
    

    Nov 22 14:05:26 ppp [opt1] 188.XX.XX.XX is OK : This seems to be the line that fails on 2.4.4 with "File exists".



  • @netblues said in PPPoE problems after upgrading from 2.4.2 to 2.4.4:

    Any ideas why a seemingly public ip is being changed to a private one?(10.0.01) Does this happen on other tests?

    It's not changing the IP, it's trying to add the IP with a default gateway of 10.0.0.1. So check your gateway settings, is the gateway for WAN set to dynamic, do you have any other gateways on your system. Make sure you have either selected the right default gateway or a properly configured gateway group if you have multiple wan gateways.

    Edit: and if your PPPoE provider really uses 10.0.0.1 as their gateway address you can't use that on your LAN.



  • @joxxxx there is new functionality dealing with gateways for the box.
    Please check it.
    Its not right to have the lan ip as a ppp gateway.
    (in system/routing gateways...)



  • @grimson I have indeed 2 WAN. The main cable network that gets the info through DHCP (provider router in gateway mode) and the fiber 1Gbit ftth through pppoe (OPT1).
    Gateways are dynamic and both wan are in a gateway group with the fiber set as Tier 1 and the cable as Tier 2.

    I don't think the provider uses 10.0.0.1 as gateway, but pfsense sees it that way. The thing is, why does it work on 2.3.3 and not on 2.4.4 and 2.4.5 ?



  • Show the whole PPP log, not just a small part of it. As for why it doesn't work on 2.4.4, mpd5 has received quite a few updates since then and it may very well check the validity of the gateway IP now.

    @joxxxx said in PPPoE problems after upgrading from 2.4.2 to 2.4.4:

    I don't think the provider uses 10.0.0.1 as gateway, but pfsense sees it that way.

    If you get 10.0.0.1 as gateway on 2.3.3 too, and the connection works then yes your provider is using that as the gateway address.


  • Netgate Administrator

    If they are actually using 10.0.0.1 as the gateway, which is possible, then you should not be using that subnet internally.

    It may work as it will probably be /32 and hence a more specific route but eliminating that potential conflict would be a good test.

    Steve



  • Oh god, I feel stupid now... You guys nailed it, the fiber modem somehow uses 10.0.0.1 as the actual pppoe gateway !
    I changed the lan network to something else and the connection worked immediately after.

    Thanks a thousand times for your help ! Finally it's fixed. I never thought they'd use this network for their actual gateway.

    So, good to know, 2.4.4+ makes pppoe more strict ! :)