Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Tracing reason for brief 50% packet loss, lag, and connection termination

    Scheduled Pinned Locked Moved General pfSense Questions
    7 Posts 2 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      stilez
      last edited by

      I have a machine and PFsense platform (ADSL) that should be able to cope with any sane demand placed on it, but at times doesn't, in a specific situation I get insane lag and packet loss, and the link is reset.  I'm not sure how to fault-trace the issue(s).  I think the issue is down to my ISP connection or PFsense. I really need ideas.

      The setup here is simple and high spec, and should be fine with a burst of (pretty modest) peak traffic here.  But for reasons unknown it isn't.

      Faced with a modest burst of http and dns requests, PFsense RRD>Quality shows that WAN RTT usually ~30ms increases to ~200 - 250ms and packet loss rises from ~ 0.6% to about 35 - 40%. That alone suggests the issue arises with PFsense or maybe the connection, because I don't think that would be affected by anything on the LAN or PC.  After a while the PPP connection resets.  PFsense has not shown any sign of overload (memory, CPU, sessions etc seem to remain normal and low throughout).  I have no idea why any of this happens.

      DATA BELOW - hardware/ISP/connection/config info, and PFsense system + PPP logs.

      Any ideas?

      SYSTEM SETUP AND CONFIG INFO

      WHAT HAPPENS

      RELEVANT SNIP OF PFSENSE SYSTEM LOG

      (Nothing for previous 30 mins)
      00:43:15 pfsense ppp: [wan_link0] LCP: rec'd Terminate Request #0 (Opened)
      00:43:15 pfsense ppp: [wan_link0] LCP: state change Opened --> Stopping
      00:43:15 pfsense ppp: [wan_link0] Link: Leave bundle "wan"
      00:43:15 pfsense ppp: [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
      00:43:15 pfsense ppp: [wan] IPCP: Close event
      00:43:15 pfsense ppp: [wan] IPCP: state change Opened --> Closing
      00:43:15 pfsense ppp: [wan] IPCP: SendTerminateReq #129
      00:43:15 pfsense ppp: [wan] IPCP: LayerDown
      00:43:15 pfsense ppp-linkdown: Removing states from [PFSENSE PUBLIC IP]
      00:43:15 pfsense ppp-linkdown: Removing states to [ISP GATEWAY IP]
      00:43:15 pfsense check_reload_status: Rewriting resolv.conf
      00:43:15 pfsense ppp: [wan] IFACE: Down event
      00:43:15 pfsense ppp: [wan] IPCP: Down event
      00:43:15 pfsense ppp: [wan] IPCP: LayerFinish
      00:43:15 pfsense ppp: [wan] Bundle: No NCPs left. Closing links...
      00:43:15 pfsense ppp: [wan] IPCP: state change Closing --> Initial
      00:43:15 pfsense ppp: [wan_link0] LCP: SendTerminateAck #71
      00:43:15 pfsense ppp: [wan_link0] LCP: LayerDown
      00:43:15 pfsense ppp: [wan_link0] rec'd proto IP during terminate phase
      00:43:16 pfsense ppp: last message repeated 22 times
      00:43:16 pfsense ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Stopping)
      00:43:16 pfsense ppp: [wan_link0] LCP: SendTerminateAck #72
      00:43:17 pfsense ppp: [wan_link0] LCP: state change Stopping --> Stopped
      00:43:17 pfsense ppp: [wan_link0] LCP: LayerFinish
      00:43:17 pfsense ppp: [wan_link0] PPPoE: connection closed
      00:43:17 pfsense ppp: [wan_link0] Link: DOWN event
      00:43:17 pfsense ppp: [wan_link0] LCP: Down event
      00:43:17 pfsense ppp: [wan_link0] LCP: state change Stopped --> Starting
      00:43:17 pfsense ppp: [wan_link0] LCP: LayerStart
      00:43:17 pfsense ppp: [wan_link0] Link: reconnection attempt 1 in 1 seconds
      00:43:18 pfsense ppp: [wan_link0] Link: reconnection attempt 1
      00:43:18 pfsense ppp: [wan_link0] PPPoE: Connecting to ''
      00:43:18 pfsense ppp: PPPoE: rec'd ACNAME "[ROUTERNAME] PPPoE"
      00:43:18 pfsense ppp: [wan_link0] PPPoE: connection successful
      00:43:18 pfsense ppp: [wan_link0] Link: UP event
      00:43:18 pfsense ppp: [wan_link0] LCP: Up event
      00:43:18 pfsense ppp: [wan_link0] LCP: state change Starting --> Req-Sent
      

      RELEVANT SNIP OF PFSENSE PPP LOG

      00:00:08 pfsense ppp: [wan] IFACE: Up event
      00:43:15 pfsense ppp: [wan_link0] LCP: rec'd Terminate Request #0 (Opened)
      00:43:15 pfsense ppp: [wan_link0] LCP: state change Opened --> Stopping
      00:43:15 pfsense ppp: [wan_link0] Link: Leave bundle "wan"
      00:43:15 pfsense ppp: [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
      00:43:15 pfsense ppp: [wan] IPCP: Close event
      00:43:15 pfsense ppp: [wan] IPCP: state change Opened --> Closing
      00:43:15 pfsense ppp: [wan] IPCP: SendTerminateReq #129
      00:43:15 pfsense ppp: [wan] IPCP: LayerDown
      00:43:15 pfsense ppp: [wan] IFACE: Down event
      00:43:15 pfsense ppp: [wan] IPCP: Down event
      00:43:15 pfsense ppp: [wan] IPCP: LayerFinish
      00:43:15 pfsense ppp: [wan] Bundle: No NCPs left. Closing links...
      00:43:15 pfsense ppp: [wan] IPCP: state change Closing --> Initial
      00:43:15 pfsense ppp: [wan_link0] LCP: SendTerminateAck #71
      00:43:15 pfsense ppp: [wan_link0] LCP: LayerDown
      00:43:15 pfsense ppp: [wan_link0] rec'd proto IP during terminate phase
      00:43:16 pfsense ppp: last message repeated 22 times
      00:43:16 pfsense ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Stopping)
      00:43:16 pfsense ppp: [wan_link0] LCP: SendTerminateAck #72
      00:43:17 pfsense ppp: [wan_link0] LCP: state change Stopping --> Stopped
      00:43:17 pfsense ppp: [wan_link0] LCP: LayerFinish
      00:43:17 pfsense ppp: [wan_link0] PPPoE: connection closed
      00:43:17 pfsense ppp: [wan_link0] Link: DOWN event
      00:43:17 pfsense ppp: [wan_link0] LCP: Down event
      00:43:17 pfsense ppp: [wan_link0] LCP: state change Stopped --> Starting
      00:43:17 pfsense ppp: [wan_link0] LCP: LayerStart
      00:43:17 pfsense ppp: [wan_link0] Link: reconnection attempt 1 in 1 seconds
      00:43:18 pfsense ppp: [wan_link0] Link: reconnection attempt 1
      00:43:18 pfsense ppp: [wan_link0] PPPoE: Connecting to ''
      00:43:18 pfsense ppp: PPPoE: rec'd ACNAME "[ROUTERNAME] PPPoE"
      00:43:18 pfsense ppp: [wan_link0] PPPoE: connection successful
      00:43:18 pfsense ppp: [wan_link0] Link: UP event
      00:43:18 pfsense ppp: [wan_link0] LCP: Up event
      00:43:18 pfsense ppp: [wan_link0] LCP: state change Starting --> Req-Sent
      00:43:18 pfsense ppp: [wan_link0] LCP: SendConfigReq #73
      
      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        I am running a Draytek V120 connected to a 20Mbps ADSL connection in the UK and have not seen such problems.
        Do you have the latest firmware on the Draytek? Which device is it? I ask because I had trouble with it initially because it had the wrong firmware. Draytek released a series of firmwares tuned for different line conditions. For the V120 at least.

        If I run a bandwidth test I see the ping times go up substantially. See my screen shot for one day average.

        My initial thoughts on this were that apinger was resetting the connection due to excessive ping times but the system log shows that's not the case.

        Perhaps your ISP is resetting it due to excessive packets per second? Some automated process to prevent a packet flood attack. You could try setting a limiter to reduce the PPS.
        You seem to have a very large number of states from your one client machine.

        Steve

        ![adsl ping times.jpg](/public/imported_attachments/1/adsl ping times.jpg)
        ![adsl ping times.jpg_thumb](/public/imported_attachments/1/adsl ping times.jpg_thumb)

        1 Reply Last reply Reply Quote 0
        • S
          stilez
          last edited by

          Thanks Stephen.

          I can ask the ISP on Monday. ISP's night support say they don't reset or do anything they know of to cause this, but will check anyway what they can.  Network here has peaked at 72k sessions in the past, transiently, during startup of this program.  It's a busy network for a household and a lot of intense burst use can happen now and then, in a way that's why I'm using a router like PFsense to handle it.  The Draytek is V2800 series and has the latest 3.3.7.3 firmware (October 2012) - standard "232201" version. I updated it from 3.3.7.1 (2011) as part of testing, and the behaviour didn't change.

          Increased bandwidth leading to slow ping responses is interesting in your comment. Why would that happen? I'm not saturating my pipe (as far as I know), and PFsense shows max 1.6 Mbits and 180 pkts/sec which is a modest burst compared to the 10 mbit link it is using, but even so it does seem something is saturating.  The RTT is from PFsense across copper to my ISP's own gateway, both PFsense and the ISP pipe should have no problems with this current kind of burst. My ping graph is below, a bit different from yours.

          The logs show WAN "LCP: rec'd Terminate Request #0" that seems to happen very early in the session shut down, and presumably has the same cause as lag and packet loss, but I'm not sure how the request is raised, whether by ISP or if it's mis described and raised internally by PFsense.  Also it doesn't explain packet lag/loss issues that precede it.  Would WAN packet capture show if this is actually an external signal?

          I suspect a bit more, that it may be due to a PFsense issue, because when I rebooted PFsense I found the behavior had reduced.  It now hits RTT ~ 150ms and ~20% packet loss, just by rebooting PFsense (was 200-250ms, 35-40% before). If it was an ISP issue then surely rebooting the router (not ADSL link) wouldn't help. Also there's an odd system log entry - the pf filter is reloaded at the same time but no obvious cause for this and I'm not doing anything at the time with filters.

          Does any of this help?

          ![Pfsense ping.png](/public/imported_attachments/1/Pfsense ping.png)
          ![Pfsense ping.png_thumb](/public/imported_attachments/1/Pfsense ping.png_thumb)

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            @stilez:

            Increased bandwidth leading to slow ping responses is interesting in your comment. Why would that happen?

            Because the outgoing and returning ping packets have to wait while other packets are processed. You can mitigate this to some extent using QoS to prioritise ping if it's all important.

            @stilez:

            My ping graph is below, a bit different from yours.

            Yes. That level of packet loss seems completely unacceptable whatever you're doing. You'll notice mine is at 0% max.

            @stilez:

            Would WAN packet capture show if this is actually an external signal?

            Good question. As far as know the packets capture only captures IP packets, layer 3. Thus if this is at the ppp layer it may not. I'm far from an expert in this though so I stand to be corrected.  ;)

            @stilez:

            If it was an ISP issue then surely rebooting the router (not ADSL link) wouldn't help. Also there's an odd system log entry - the pf filter is reloaded at the same time but no obvious cause for this and I'm not doing anything at the time with filters.

            The pf filter is reloaded when ever there is a change on any of the interfaces, even if the filter is set to not filter anything. That's normal.

            With an ADSL connection you are dealing with several networks. My ISP is Plusnet (who are awesome by the way!). My pfSense box talks directly to Plusnet's PPP servers. The Draytek modem talks to BT's equipment in the exchange but that connection should be transparent to the pfSense-Plusnet connection.
            Thus if you have a problem at your ISP rebooting your pfSense box may change which PPP server you end up connected to, for example.

            I have had similar problems in the past that turned out to be a faulty line card in the exchange. That's happened to me several times. Statistical anomaly or common fault?  ::)

            Do you have any line data from your modem that might indicate a problem? See my screenshot.

            Also this seems relevant: http://forum.pfsense.org/index.php/topic,32653.msg168773.html#msg168773

            Steve

            v120status.jpg
            v120status.jpg_thumb

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Just noticed, 10Mbps is unusual for a UK ADSL connection which are usually 8, 20 or 24.
              Are you being rate adapted down? Are you at the end of a long line?

              Steve

              1 Reply Last reply Reply Quote 0
              • S
                stilez
                last edited by

                Thanks Steve, I'll have to look at some of those ideas later, it's interesting.  Quick answer on the line rate and stats - the 10mbit is the usual rate adapted - it's usually between 9 - 10 mbits depending on the day. I'm 3/4 mile from the exchange, so getting near the top end of what I can hope for, I guess.  Though it's a bit lower than usual today at 5.3 mbit (still much higher than the router is trying to handle at <2 mbit/sec).  SNR is also usually reported as 5-6dB so it's high today, but also the connection's been up and down while testing so the speed's being downrated for that reason too.

                Current line stats:

                TX Cells = 1.567 M, RX Cells = 1.475 M, TX CRC errs = 0, RX CRC errs = 0
                ADSL = ADSL2(G.992.3) SHOWTIME, Up/Down = 893kbits/5.3mbits
                SNR = 14, Loop = 33

                24 mbit line I see! :)

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Interesting. 5-6dB seems low for a 10Mb connection to me. Generally speaking you should be able to get a high connection speed at lower margins but at a payoff with stability. The rate adaptation should take care of that but it can be reset manually if it goes awry for whatever reason.
                  However I don't think it's the cause of the disconnects, you would see that in the system logs and the modem uptime.
                  It looks to me as though something at your ISP is sending the disconnect commands at the ppp layer, as JimP suggested in the other thread. Maybe you can capture those and present them as evidence to your ISP? I've never tried.

                  @stilez:

                  24 mbit line I see! :)

                  Yes. My connection here at home has always been very good. It should be, I can see the exchange from the window!  :) Although I have a 24Mb line speed I only get 20Mb because Plusnet is not an LLU provider.

                  As an aside there is a lot more information available from Draytek modems via the telnet interface.
                  See: http://forum.pfsense.org/index.php/topic,52091.0.html

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.