PPPoE problems after upgrading from 2.4.2 to 2.4.4



  • Hi,

    I took the plunge today and upgraded from pfSense 2.4.2 to the 2.4.4 release.
    Upgrade process itself went fine but after the reboot my pfSense had no internet connection.

    pfSense box is directly connected with an ethernet cable to the fiber termination point of my provider.
    NIC in the pfSense is an Intel Pro quadport (igb0) and i'm using PPPoE on a vlan from my provider.
    It seems that the ppp-daemon for whatever reason can't reach the upstream provider.

    clog /var/log/ppp.log
    Sep 29 15:30:17 p ppp: process 43663 started, version 5.8 (nobody@pfSense_v2_4_4_amd64-pfSense_v2_4_4-job-04 23:36  4-Sep-2018)
    Sep 29 15:30:17 p ppp: web: web is not running
    Sep 29 15:30:17 p ppp: [wan] Bundle: Interface ng0 created
    Sep 29 15:30:17 p ppp: [wan_link0] Link: OPEN event
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: Open event
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: state change Initial --> Starting
    Sep 29 15:30:17 p ppp: [wan_link0] LCP: LayerStart
    Sep 29 15:30:17 p ppp: [wan_link0] PPPoE: Set PPP-Max-Payload to '1500'
    Sep 29 15:30:17 p ppp: [wan_link0] PPPoE: Connecting to ''
    Sep 29 15:30:26 p ppp: [wan_link0] PPPoE connection timeout after 9 seconds
    Sep 29 15:30:26 p ppp: [wan_link0] Link: DOWN event
    Sep 29 15:30:26 p ppp: [wan_link0] LCP: Down event
    

    Only to repeat the pppoe negotiationg again and again.
    Looking at the logs from before the upgrade it should have looked something like this:

    Sep 27 18:57:08 p ppp: process 5305 started, version 5.8 (nobody@pfSense_v2_4_2_amd64-pfSense_v2_4_2-job-14 18:47 16-Nov-2017)
    Sep 27 18:57:08 p ppp: web: web is not running
    Sep 27 18:57:08 p ppp: [wan] Bundle: Interface ng0 created
    Sep 27 18:57:08 p ppp: [wan_link0] Link: OPEN event
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: Open event
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: state change Initial --> Starting
    Sep 27 18:57:08 p ppp: [wan_link0] LCP: LayerStart
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: Set PPP-Max-Payload to '1500'
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: Connecting to ''
    Sep 27 18:57:08 p ppp: PPPoE: rec'd ACNAME "x.x.x.x"
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: rec'd PPP-Max-Payload '1500'
    Sep 27 18:57:08 p ppp: [wan_link0] PPPoE: connection successful
    

    But i never reach that stage.
    Rolling back to the 2.4.2 release fixes my internet connection and re-upgrading breaks the it again.
    I also tried removing the complete config from the 2.4.4 installation and redo all steps for the pppoe connection but without any luck.

    I found this recent bug on the tracker but it seems to have been resolved, so it should not be hitting it.
    https://redmine.pfsense.org/issues/8603

    Does anyone have a tip on how to continue debugging the issue?

    Thanks!
    Peter


  • Netgate Administrator

    Try running at the command line, or in Diag > Command Prompt:
    ifconfig igb0 promisc

    Assming igb0 is parent of the vlan you are running PPPoE on. I saw one other case where that allowed the connection to complete but it's not yet known why.

    Steve



  • Hi Steve,

    Thanks for the suggestion, just tried it but unfortunately same behavior.
    I'm out today but i can make a packet capture tomorrow if that helps debugging?

    -peter


  • Netgate Administrator

    Yes, I would assign the parent interface and then run a packet capture there. You should be able to see both the PPP session and the VLAN tagging in that pcap to check they are correct.

    Steve



  • @pbosgraaf

    I may have the same issue see my post (https://forum.netgate.com/topic/136174/update-to-2-4-4-failed-in-hyper-v-2012r2).

    I am also running PPPoE and nothing can connect to the internet. I was also getting Altq errors with 2.4.4 which I was not with 2.4.3. I hadnt twigged it could be a PPPoE issue as well.

    I have also noticed in 2.4.3 that each time the router restarts, PPPoE connection goes down about 30 seconds after intially coming up and needs to be manually restarted. I wonder what has changed with these recent releases.

    Ian



  • This seems to be an upgrade issue, not necessarily an issues with 2.4.4. I had the same issue today. Even re-installing pfsense while keeping the config didn't fix it. However, re-installing a clean version of pfsense 2.4.4 and then setting up my PPPoE again worked just fine. I'm now restoring piece after piece from my old config (except the interface configuration).


  • Netgate Administrator

    I'd be very interested if you find something in the old config that is breaking it.

    Can you share the WAN and PPP sections? Omitting logins of course.

    Steve



  • It looks like there are multiple issues in my old config (judging from which restore option break the PPPoE).

    1. issue: My interface setup

    <interfaces>
    		<wan>
    			<if>pppoe0</if>
    			<blockpriv></blockpriv>
    			<blockbogons></blockbogons>
    			<descr><![CDATA[WAN]]></descr>
    			<spoofmac>00:00:00:00:00:00</spoofmac>
    			<enable></enable>
    			<ipaddr>pppoe</ipaddr>
    		</wan>
    		<lan>
    			<enable></enable>
    			<if>igb1</if>
    			<descr><![CDATA[LAN]]></descr>
    			<spoofmac></spoofmac>
    			<ipaddr>10.12.10.1</ipaddr>
    			<subnet>24</subnet>
    		</lan>
    		<opt1>
    			<descr><![CDATA[OPT1]]></descr>
    			<if>igb2</if>
    			<spoofmac></spoofmac>
    			<enable></enable>
    			<ipaddr>10.12.20.1</ipaddr>
    			<subnet>24</subnet>
    		</opt1>
    		<opt2>
    			<descr><![CDATA[VPN-OUT]]></descr>
    			<if>ovpnc2</if>
    			<enable></enable>
    			<spoofmac></spoofmac>
    		</opt2>
    		<opt3>
    			<descr><![CDATA[VPN-IN]]></descr>
    			<if>ovpns1</if>
    			<enable></enable>
    			<ipaddr>10.12.30.1</ipaddr>
    			<subnet>24</subnet>
    			<spoofmac></spoofmac>
    		</opt3>
    	</interfaces>
    
    <ppps>
    		<ppp>
    			<ptpid>0</ptpid>
    			<type>pppoe</type>
    			<if>pppoe0</if>
    			<ports>igb0.7</ports>
    			<username>****</username>
    			<password>****</password>
    			<provider>****</provider>
    			<bandwidth></bandwidth>
    			<mtu></mtu>
    			<mru></mru>
    			<mrru></mrru>
    		</ppp>
    	</ppps>
    	<vlans>
    		<vlan>
    			<if>igb0</if>
    			<tag>7</tag>
    			<pcp></pcp>
    			<descr><![CDATA[VDSL]]></descr>
    			<vlanif>igb0.7</vlanif>
    		</vlan>
    	</vlans>
    

    opt3 seems to cause the issue, since even settings the fresh install up in this way breaks the PPPoE immediately. This worked fine in 2.4.3-p1 and was mainly used to serve DNS to OpenVPN clients.

    2. issue: My packages config (PfBlockerNG 2.1.4_13, OpenVPN Client Export Utility 1.4.17_2, Traffic Totals 1.2.4)
    No idea what's the issue here and the complete configs would be a little much, but importing the package config also seems to break the PPPoE. I haven't gotten around to setting this up in my fresh install so I don't (yet) know whether setting it up without the config will also break PPPoE.

    Imports that worked without any issues: Aliases, DNS Resolver, DHCP Server, DHCPv6 Server, System, Wake-on-LAN


  • Netgate Administrator

    Hmm, one thing that might be causing this is the new gateway handling in 2.4.4.

    Does your PPPoE connection fail to connect or just fail to route traffic?

    Check it is the default gateway in System > Routing > Gateways when it fails. Set it specifically as default if it is not.

    Steve



  • It does not connect at all:

    Oct 3 15:58:43	ppp		[wan_link0] Link: reconnection attempt 1 in 3 seconds
    Oct 3 15:58:46	ppp		[wan_link0] Link: reconnection attempt 1
    Oct 3 15:58:46	ppp		[wan_link0] PPPoE: can't connect "[13]:"->"mpd9828-0" and "[11]:"->"left": No such file or directory
    Oct 3 15:58:46	ppp		[wan_link0] can't remove hook mpd9828-0 from node "[13]:": No such file or directory
    Oct 3 15:58:46	ppp		[wan_link0] Link: DOWN event
    Oct 3 15:58:46	ppp		[wan_link0] LCP: Down event
    

  • Netgate Administrator

    At the command line or in Diag > Command Prompt try running:
    clog /var/log/ppp.log | grep secret

    You might need to do that after rebooting as you ppp log will be exceptionally busy.

    If it reports a missing secret try re-creating the ppp interface to generate the file again. We have seen that with one other user so far.

    Steve



  • Just try rebooting.
    I have the same error which goes away after reboot.
    No issues in 2.4.3
    https://forum.netgate.com/topic/135920/pfsense-2-4-4-fails-all-pppoe-s-after-disabling-one/4



  • @netblues Rebooting does fix it, thanks. However, it doesn't seem ideal to have to reboot anytime you assign a (virtual) interface.



  • try disabling and re enabling, you will probably end up to the same issue. The thing is no one has a clue what is causing this.



  • @netblues It actually looks like just ANY change to ANY interface will cause these pppoe-issues. I just changes my OPT1 static ip address and even that killed my pppoe until I rebooted.


  • Netgate Administrator

    Did you try running that command? I assume it did not log any errors there?

    Try running ifconfig -av when it's working and when it's not. See if there's anything different there.

    It's hard to believe it would change when altering another interface but you could also check the generated file at /var/etc/mpd_wan.conf. Does that change?

    Steve



  • @stephenw10 Thank you very much for your continued efforts. And just to be clear: My PPPoE connection works fine unless I change any setting of any interface - then I have to reboot.

    clog [...] secret did not result in any output.

    Here's what ifconfig -av returns when things are fine:

    0_1539186760802_Screenshot from 2018-10-10 17-51-43.png

    And when PPPoE is broken:

    0_1539186777441_Screenshot from 2018-10-10 17-51-53.png

    The conf file does not change.

    [Sorry for posting screenshots, anti-filter would not let me post the output as text]



  • Please do something with akisment. It doesnt allow output posts.

    Situation is exactly the same over here.


  • Rebel Alliance Developer Netgate

    @netblues said in PPPoE problems after upgrading from 2.4.2 to 2.4.4:

    Please do something with akisment. It doesnt allow output posts.

    Situation is exactly the same over here.

    1. Wrap the output in code tags
      or
    2. Attach long output as a text file, rather than dumping it all in the post

    Both are good forum etiquette and will avoid issues with posts being flagged as spam.



  • Unfortunately its the tagged code text that is not allowed by akismet..
    Heuristic analysis of what is spam and what is not fails miserably.
    Will try attachements next time.

    On topic, I have 3 instances, one is a physical box running on intel nuk hardware and two are virtual under centos kvm.
    All three of them talk to the same bridge modem devices via vlan trunks.
    The two virtual ones have been reverted to 2.4.3p1 and work fine.
    The physical one (on 2.4.4, updated from 2.4.3p1) needs to reboot every time something changes on ppp
    (provider allows three concurrent ppp calls)
    The only purpose of the physical box is to test 2.4.4 issues. Unfortunately, since it only has one physical lan, can't test without vlans.
    On the other hand, if everyone had issues with simple pppoe there would be lots of complain threads by now.
    I suspect it has to do with permssions and/or file ownersip, which is corrected upon reboot. If it was a config issue, rebooting would never fix it.
    Any pointers on what to look for?