A Crazy Problem . . .



  • I have my PfSense box connected via PPPoE to a DSL modem (in bridge mode).  I run 15/1 Mbits on my DSL line (but my problem appears to be independed of DSL speed as I had the same problem when I was running at 10/1 and 5/0.768).  My problem is that I lose my ISP's WAN IP address at least once, sometimes as often as three times, per day.  I can duplicate the problem by running a flash-based speed test at the DSL Reports' web page.  The speed test appears to run fine, but I always lose my DSL connection (WAN IP) after the test runs (oddly, the DSL sync light on my modem remains green–indicating that I still have sync between my modem and the ISP's DSLAM).  The only way I can restore my connection is to reboot my PfSense router.  When I do so, I end up with a new IP address from my ISP (the interesting thing about this fact is that I can reboot my router all day and keep the same WAN IP address as I had when I started unless I have this problem).  I have also had this same problem with consumer grade routers that I have used in prior times, so it is not a problem unique to PfSense.  PfSense, however, (unlike the previous routers I have owned) allows me to look at its log when the problem occurs.  Here is what the log says every time I have the problem and lose my WAN IP:

    apinger: /usr/local/bin/rrdtool respawning too fast, waiting 300s.
    apinger: ALARM: WAN(10.0.68.1) *** down ***

    I have had my ISP change my DSLAM port, look at my DSL line for trouble, to no avail--everything checks out o.k.  Somehow, I don't think my problem relates to DSL line stability.  It is almost as if my ISP's central office switches and routers don't like my data packets when I do intense uploading/downloading (such as happens during speed tests) and locks up my WAN IP address.  Does anyone have a clue as to what may be going on in light of the log data I have quoted above?  Thanks for your help, please. ???


  • Netgate Administrator

    apinger: /usr/local/bin/rrdtool respawning too fast, waiting 300s.

    This is nothing to worry about. See: http://redmine.pfsense.org/issues/1399

    apinger: ALARM: WAN(10.0.68.1) *** down ***

    Your WAN IP is in a private range, is that correct? It seems slightly odd given that you're using pppoe to your ISP.

    Steve

    Edit: Shocking grammar!



  • @stephenw10:

    You WAN IP is in a private range, is that correct? It seems slightly odd given that you using pppoe to your ISP.

    Probably not, might be, but it's common to get a public IP assigned and have a private IP as your gateway with PPPoE. It doesn't have to be the same subnet or anywhere near it on PPP*.

    The reason for the timeout is just that the ISP's end of the connection doesn't respond to pings, just putting something on the Internet in there (8.8.8.8 or something) will fix.

    That's likely not related to the problem I suspect. What do the PPP logs show?



  • The 10.0.68.1 appears to be the address of my ISP's equipment in the central office–it is not the WAN IP address I had before the failure.

    "This seems to be an assumption made from apinger that after an error occured you cannot respawn rrdtool faster than 300secoonds.
    You just loose one round of update to the rrd's."

    What's an "rrd" and why does it need to be respawned?

    I'll look into the ppp log next time it goes down.



  • cmb:

    Here is a copy of the relevant portion of my PPP log before and after a fall:

    Mar 29 21:58:10 ppp: [wan] 10.8.66.1 is OK
    Mar 29 21:58:10 ppp: [wan] IPCP: SendConfigAck #215
    Mar 29 21:58:10 ppp: [wan] IPADDR 10.8.66.1
    Mar 29 21:58:10 ppp: [wan] IPCP: state change Ack-Rcvd –> Opened
    Mar 29 21:58:10 ppp: [wan] IPCP: LayerUp
    Mar 29 21:58:10 ppp: [wan] 79.176.16.93 -> 10.8.66.1
    Mar 29 21:58:10 ppp: [wan] IFACE: Up event
    Mar 29 22:41:03 ppp: [wan_link0] LCP: no reply to 1 echo request(s)
    Mar 29 22:41:13 ppp: [wan_link0] LCP: no reply to 2 echo request(s)
    Mar 29 22:41:23 ppp: [wan_link0] LCP: no reply to 3 echo request(s)
    Mar 29 22:41:33 ppp: [wan_link0] LCP: no reply to 4 echo request(s)
    Mar 29 22:41:43 ppp: [wan_link0] LCP: no reply to 5 echo request(s)
    Mar 29 22:41:43 ppp: [wan_link0] LCP: peer not responding to echo requests
    Mar 29 22:41:43 ppp: [wan_link0] LCP: state change Opened –> Stopping
    Mar 29 22:41:43 ppp: [wan_link0] Link: Leave bundle "wan"
    Mar 29 22:41:43 ppp: [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
    Mar 29 22:41:43 ppp: [wan] IPCP: Close event
    Mar 29 22:41:43 ppp: [wan] IPCP: state change Opened –> Closing
    Mar 29 22:41:43 ppp: [wan] IPCP: SendTerminateReq #4
    Mar 29 22:41:43 ppp: [wan] IPCP: LayerDown
    Mar 29 22:41:43 ppp: [wan] IFACE: Down event
    Mar 29 22:41:43 ppp: [wan] IPCP: Down event
    Mar 29 22:41:43 ppp: [wan] IPCP: LayerFinish
    Mar 29 22:41:43 ppp: [wan] Bundle: No NCPs left. Closing links…
    Mar 29 22:41:43 ppp: [wan] IPCP: state change Closing –> Initial
    Mar 29 22:41:43 ppp: [wan_link0] LCP: SendTerminateReq #2
    Mar 29 22:41:43 ppp: [wan_link0] LCP: LayerDown
    Mar 29 22:41:45 ppp: [wan_link0] LCP: SendTerminateReq #3
    Mar 29 22:41:47 ppp: [wan_link0] LCP: state change Stopping –> Stopped
    Mar 29 22:41:47 ppp: [wan_link0] LCP: LayerFinish
    Mar 29 22:41:47 ppp: [wan_link0] PPPoE: connection closed
    Mar 29 22:41:47 ppp: [wan_link0] Link: DOWN event
    Mar 29 22:41:47 ppp: [wan_link0] LCP: Down event
    Mar 29 22:41:47 ppp: [wan_link0] LCP: state change Stopped –> Starting
    Mar 29 22:41:47 ppp: [wan_link0] LCP: LayerStart
    Mar 29 22:41:47 ppp: [wan_link0] Link: reconnection attempt 1 in 3 seconds
    Mar 29 22:41:50 ppp: [wan_link0] Link: reconnection attempt 1
    Mar 29 22:41:50 ppp: [wan_link0] PPPoE: Connecting to ''
    Mar 29 22:41:59 ppp: [wan_link0] PPPoE connection timeout after 9 seconds
    Mar 29 22:41:59 ppp: [wan_link0] Link: DOWN event
    Mar 29 22:41:59 ppp: [wan_link0] LCP: Down event
    Mar 29 22:41:59 ppp: [wan_link0] Link: reconnection attempt 2 in 1 seconds
    Mar 29 22:42:00 ppp: [wan_link0] Link: reconnection attempt 2
    Mar 29 22:42:00 ppp: [wan_link0] PPPoE: Connecting to ''
    Mar 29 22:42:09 ppp: [wan_link0] PPPoE connection timeout after 9 seconds
    Mar 29 22:42:09 ppp: [wan_link0] Link: DOWN event
    Mar 29 22:42:09 ppp: [wan_link0] LCP: Down event
    Mar 29 22:42:09 ppp: [wan_link0] Link: reconnection attempt 3 in 2 seconds
    Mar 29 22:42:11 ppp: [wan_link0] Link: reconnection attempt 3
    Mar 29 22:42:11 ppp: [wan_link0] PPPoE: Connecting to ''
    Mar 29 22:42:20 ppp: [wan_link0] PPPoE connection timeout after 9 seconds
    Mar 29 22:42:20 ppp: [wan_link0] Link: DOWN event
    Mar 29 22:42:20 ppp: [wan_link0] LCP: Down event


  • Netgate Administrator

    @Nonsense:

    What's an "rrd" and why does it need to be respawned?

    RRDtool is used to record network stats in pfSense. It is (re)started when a new connection is made.

    Comparing this with my own log it looks like there is quite a lot missing from your successful login record.
    However I don't know if it would be any help, nor do I know how much of that info you want to be posting on a public forum.

    Clearly you are not getting past the PPPoE connection, what does it log on a successful attempt?

    From my own logs (which are unfortunately reversed compared to yours!):

    
    Mar 30 12:22:38 	ppp: [wan_link0] Link: UP event
    Mar 30 12:22:38 	ppp: [wan_link0] PPPoE: connection successful
    Mar 30 12:22:38 	ppp: PPPoE: rec'd ACNAME "Vigor2000 PPPoE"
    Mar 30 12:22:38 	ppp: [wan_link0] PPPoE: Connecting to 'Plusnet'
    Mar 30 12:22:38 	ppp: [wan_link0] Link: reconnection attempt 1
    Mar 30 12:22:36 	ppp: [wan_link0] Link: reconnection attempt 1 in 2 seconds
    Mar 30 12:22:36 	ppp: [wan_link0] LCP: LayerStart
    Mar 30 12:22:36 	ppp: [wan_link0] LCP: state change Stopped --> Starting
    
    

    Steve



  • Here is my PPP log when I reboot and have a sucessful logon after I am knocked off (I have altered my logon identity and IP addresses so as to protect my identity):

    Mar 31 07:33:34 ppp: PPPoE: rec'd ACNAME "BSTNMA-CAMBMABE-ERXG01"
    Mar 31 07:33:34 ppp: [wan_link0] PPPoE: connection successful
    Mar 31 07:33:34 ppp: [wan_link0] Link: UP event
    Mar 31 07:33:34 ppp: [wan_link0] LCP: Up event
    Mar 31 07:33:34 ppp: [wan_link0] LCP: state change Starting –> Req-Sent
    Mar 31 07:33:34 ppp: [wan_link0] LCP: SendConfigReq #1
    Mar 31 07:33:34 ppp: [wan_link0] PROTOCOMP
    Mar 31 07:33:34 ppp: [wan_link0] MRU 1492
    Mar 31 07:33:34 ppp: [wan_link0] MAGICNUM 5f3f6028
    Mar 31 07:33:34 ppp: [wan_link0] LCP: rec'd Configure Request #161 (Req-Sent)
    Mar 31 07:33:34 ppp: [wan_link0] MRU 1492
    Mar 31 07:33:34 ppp: [wan_link0] AUTHPROTO PAP
    Mar 31 07:33:34 ppp: [wan_link0] MAGICNUM 2f5c34a3
    Mar 31 07:33:34 ppp: [wan_link0] LCP: SendConfigAck #161
    Mar 31 07:33:34 ppp: [wan_link0] MRU 1492
    Mar 31 07:33:34 ppp: [wan_link0] AUTHPROTO PAP
    Mar 31 07:33:34 ppp: [wan_link0] MAGICNUM 2f5c34a3
    Mar 31 07:33:34 ppp: [wan_link0] LCP: state change Req-Sent –> Ack-Sent
    Mar 31 07:33:34 ppp: [wan_link0] LCP: rec'd Configure Ack #1 (Ack-Sent)
    Mar 31 07:33:34 ppp: [wan_link0] PROTOCOMP
    Mar 31 07:33:34 ppp: [wan_link0] MRU 1492
    Mar 31 07:33:34 ppp: [wan_link0] MAGICNUM 5f3f6028
    Mar 31 07:33:34 ppp: [wan_link0] LCP: state change Ack-Sent –> Opened
    Mar 31 07:33:34 ppp: [wan_link0] LCP: auth: peer wants PAP, I want nothing
    Mar 31 07:33:34 ppp: [wan_link0] PAP: using authname "vze3b54s"
    Mar 31 07:33:34 ppp: [wan_link0] PAP: sending REQUEST #1 len: 22
    Mar 31 07:33:34 ppp: [wan_link0] LCP: LayerUp
    Mar 31 07:33:34 ppp: [wan_link0] PAP: rec'd ACK #1 len: 5
    Mar 31 07:33:34 ppp: [wan_link0] LCP: authorization successful
    Mar 31 07:33:34 ppp: [wan_link0] Link: Matched action 'bundle "wan" ""'
    Mar 31 07:33:34 ppp: [wan_link0] Link: Join bundle "wan"
    Mar 31 07:33:34 ppp: [wan] Bundle: Status update: up 1 link, total bandwidth 64000 bps
    Mar 31 07:33:34 ppp: [wan] IPCP: Open event
    Mar 31 07:33:34 ppp: [wan] IPCP: state change Initial –> Starting
    Mar 31 07:33:34 ppp: [wan] IPCP: LayerStart
    Mar 31 07:33:34 ppp: [wan] IPCP: Up event
    Mar 31 07:33:34 ppp: [wan] IPCP: state change Starting –> Req-Sent
    Mar 31 07:33:34 ppp: [wan] IPCP: SendConfigReq #1
    Mar 31 07:33:34 ppp: [wan] IPADDR 0.0.0.0
    Mar 31 07:33:34 ppp: [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
    Mar 31 07:33:34 ppp: [wan] IPCP: rec'd Configure Reject #1 (Req-Sent)
    Mar 31 07:33:34 ppp: [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
    Mar 31 07:33:34 ppp: [wan] IPCP: SendConfigReq #2
    Mar 31 07:33:34 ppp: [wan] IPADDR 0.0.0.0
    Mar 31 07:33:34 ppp: [wan] IPCP: rec'd Configure Request #154 (Req-Sent)
    Mar 31 07:33:34 ppp: [wan] IPADDR 10.8.63.1
    Mar 31 07:33:34 ppp: [wan] 10.8.63.1 is OK
    Mar 31 07:33:34 ppp: [wan] IPCP: SendConfigAck #154
    Mar 31 07:33:34 ppp: [wan] IPADDR 10.8.63.1
    Mar 31 07:33:34 ppp: [wan] IPCP: state change Req-Sent –> Ack-Sent
    Mar 31 07:33:34 ppp: [wan] IPCP: rec'd Configure Nak #2 (Ack-Sent)
    Mar 31 07:33:34 ppp: [wan] IPADDR 92.133.2.223
    Mar 31 07:33:34 ppp: [wan] 92.233.4.228 is OK
    Mar 31 07:33:34 ppp: [wan] IPCP: SendConfigReq #3
    Mar 31 07:33:34 ppp: [wan] IPADDR 92.133.2.223
    Mar 31 07:33:34 ppp: [wan] IPCP: rec'd Configure Ack #3 (Ack-Sent)
    Mar 31 07:33:34 ppp: [wan] IPADDR 92.133.2.223
    Mar 31 07:33:34 ppp: [wan] IPCP: state change Ack-Sent –> Opened
    Mar 31 07:33:34 ppp: [wan] IPCP: LayerUp
    Mar 31 07:33:34 ppp: [wan] 92.133.2.223 -> 10.8.63.1
    Mar 31 07:33:34 ppp: [wan] IFACE: Up event



  • I have googled "LCP: no reply to 1 echo request(s)" and discovered that other PfSense and even Monowall users have had similar PPPoE disconnect problems.  Could my PfSense firewall be causing my connection to go down (e.g., when I run the speed tests at DSLReports and my connection goes down I notice blocked outbound entries in my firewall log)?

    The only "abnormality" that my phone company reports about my connection is that they cannot ping my home router.  I, of course, have ICMP ping turned off in my firewall.  Could the lack of being able to be pinged cause my central office DSLAM to drop my connection?


  • Netgate Administrator

    @Nonsense:

    Could the lack of being able to be pinged cause my central office DSLAM to drop my connection?

    I have never encountered that but it is conceivable.

    There are many threads here about pppoe problems unfortunately. I have never had any trouble though.

    I can't see anything in your log that looks unusual (not that I am any sort of expert!) other than the connection gateway being in a private subnet.

    Steve


Log in to reply