• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

runaway delay average and std. dev. on WAN

Scheduled Pinned Locked Moved General pfSense Questions
29 Posts 2 Posters 1.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    papaMURKS
    last edited by Aug 8, 2024, 1:58 PM

    Hello,

    SG-3100 running pfsense+ 24.03

    in Status -> Monitoring for my WAN_DHCP, the average and standard deviation of the delay starts to 'run away' (increase exponentially) for an unknown reason slowly following a Router/Gateway (RG) restart. I replaced my RG from ATT in attempt to resolve this issue, and I now have the BGW320-505. I am running that RG in "Passthrough" mode (quotes used because I'm aware its not true Passthrough).

    This condition results in poor performance on the network. My goal in posting is to eventually eliminate this issue (if it's on my side) but also to learn some skills about how I should go about investigating this issue.

    If it's not already clear, I'm just an aspiring 'pro-sumer' and am open to learning. Really appreciate any help.

    Below is a screenshot of the last 30 days.
    c86febcf-8d7e-47f7-bcd1-07695693616e-image.png

    Below is a graph of the last year. The RG was replaced in ~mid-May:
    d56ffcd8-040e-4076-9192-2734eb02a6ba-image.png

    Thank you!!

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Aug 8, 2024, 4:47 PM

      Hmm, that's interesting. So before March you were running the same setup with no such issues?

      What resets this? Rebooting the 3100? Rebooting the AT&T box? Reconnecting the cable?

      Do the monitoring graphs show similar increases for CPU or memory usage?

      Steve

      P 1 Reply Last reply Aug 8, 2024, 8:00 PM Reply Quote 0
      • P
        papaMURKS @stephenw10
        last edited by Aug 8, 2024, 8:00 PM

        @stephenw10 said in runaway delay average and std. dev. on WAN:

        Hmm, that's interesting. So before March you were running the same setup with no such issues?

        What resets this? Rebooting the 3100? Rebooting the AT&T box? Reconnecting the cable?

        Do the monitoring graphs show similar increases for CPU or memory usage?

        Steve

        Before March I was running a BGW320-500, but all other hardware has remained the same.

        The condition is 'reset' by rebooting the AT&T box. (probably worth saying that idk if this actually resets it - due to the apparent exponential nature getting compounded, the issue may be present immediately following restart, albeit to a negligible extent).

        I reviewed all other graph types and there is no apparent correlation to CPU, memory, or anything else. (You do see state changes coinciding with the reboot but I think that is correlation and not causation).

        S 1 Reply Last reply Aug 8, 2024, 8:14 PM Reply Quote 0
        • S
          stephenw10 Netgate Administrator @papaMURKS
          last edited by Aug 8, 2024, 8:14 PM

          @papaMURKS said in runaway delay average and std. dev. on WAN:

          The condition is 'reset' by rebooting the AT&T box. (probably worth saying that idk if this actually resets it - due to the apparent exponential nature getting compounded, the issue may be present immediately following restart, albeit to a negligible extent).

          Hmm, I would try reconnecting the cable between the AT&T device and 3100. See if that resets anything.

          Then try reconnecting the upstream incoming WAN cable if you can. See if that has any effect.

          If rebooting the AT&T router resets it though that seems like an issue there.

          P 1 Reply Last reply Aug 8, 2024, 8:20 PM Reply Quote 0
          • P
            papaMURKS @stephenw10
            last edited by Aug 8, 2024, 8:20 PM

            @stephenw10 said in runaway delay average and std. dev. on WAN:

            @papaMURKS said in runaway delay average and std. dev. on WAN:

            The condition is 'reset' by rebooting the AT&T box. (probably worth saying that idk if this actually resets it - due to the apparent exponential nature getting compounded, the issue may be present immediately following restart, albeit to a negligible extent).

            Hmm, I would try reconnecting the cable between the AT&T device and 3100. See if that resets anything.

            Then try reconnecting the upstream incoming WAN cable if you can. See if that has any effect.

            If rebooting the AT&T router resets it though that seems like an issue there.

            Thanks for the suggestion. Unfortunately I won't be able to determine if there is any effect using this method until the behavior presents itself again.

            I am open to any other investigation or troubleshooting techniques that may help identify the issue until then :)

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Aug 8, 2024, 8:26 PM

              Do you actually see that delay against external hosts? Is your WAN monitoring using the WAN IP directly?

              P 1 Reply Last reply Aug 8, 2024, 8:36 PM Reply Quote 0
              • P
                papaMURKS @stephenw10
                last edited by papaMURKS Aug 8, 2024, 8:41 PM Aug 8, 2024, 8:36 PM

                @stephenw10 said in runaway delay average and std. dev. on WAN:

                Do you actually see that delay against external hosts? Is your WAN monitoring using the WAN IP directly?

                I might show my ignorance in this response, but I'll take a crack at it:

                The monitoring is not monitoring my WAN gateway directly. (i.e., if my WAN IP is 55.55.55.55, my WAN gateway is 55.55.55.1, and I'm monitoring 123.45.67.1)

                I gathered that monitoring address by doing a Traceroute and selected the first hop after the local IP of the RG. (i.e., 1st hop is 192.168.1.254, and 2nd hop is 123.45.67.1)

                I'm not sure how to answer whether I see the delay against external hosts.

                1 Reply Last reply Reply Quote 0
                • S
                  stephenw10 Netgate Administrator
                  last edited by Aug 8, 2024, 9:06 PM

                  Well in part of that graph you are seeing ping latency >100ms. So if you ping, for example, 8.8.8.8 you will very clearly see that. If you only see it against the monitoring target that implies something other than just a delay in the route may be happening.

                  P 1 Reply Last reply Aug 9, 2024, 12:51 PM Reply Quote 0
                  • P
                    papaMURKS @stephenw10
                    last edited by papaMURKS Aug 9, 2024, 12:53 PM Aug 9, 2024, 12:51 PM

                    @stephenw10 yes there is definitely a delay from reaching external hosts. noticeable by pinging an ip directly, as well as by pinging a domain. (i use Unifi Wifiman which has a neat little UI for monitoring pings in real time to facebook, google, x, and i added 8.8.8.8 and 1.1.1.1 as well. those pings to my local gateway are normal, ~4ms)

                    1 Reply Last reply Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Aug 9, 2024, 1:03 PM

                      Ok cool. Then I guess wait for it to grow to something clearly visible then try to reset it without rebooting the AT&T router. If nothing else resets it the issue pretty much has to be there.

                      P 1 Reply Last reply Aug 13, 2024, 2:01 PM Reply Quote 0
                      • P
                        papaMURKS @stephenw10
                        last edited by Aug 13, 2024, 2:01 PM

                        @stephenw10 so, unexpected behavior...
                        69c0d1b2-106e-482e-92c4-5b3d8fe78c47-image.png
                        above graph is last 2 days...

                        around 5pm on 8/11 my rtt ping statistics improved drastically (i.e., to expected levels and consistent with times immediately following a RG reboot) with NO (known) INTERVENTION BY ME

                        in reviewing my logs, i see a HUGE amount of arpresolve logs in the times leading up to and following the good RTT pings:

                        7d4b96cd-2b54-4ed6-b05d-05269c8d27f7-image.png

                        For context, the logs in the above image are only displaying 500 lines and the first line starts Aug 11 @ 16:45. so this created ~480 entries between 1645 and 1652...

                        192.168.1.254 is the LAN address of the RG. I don't recall seeing this message before, but almost certainly not to this extent.

                        1 Reply Last reply Reply Quote 0
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Aug 13, 2024, 4:23 PM

                          Hmm, seems like it might have rebooted? Unable to allocate local link info like that generally means pfSense doesn't have a IP in that subnet. So like it lost it's DHCP lease or the WAN went down.

                          Though I'd expect to see some monitoring ping failures if that was the case.

                          P 1 Reply Last reply Aug 13, 2024, 8:52 PM Reply Quote 0
                          • P
                            papaMURKS @stephenw10
                            last edited by papaMURKS Aug 13, 2024, 8:53 PM Aug 13, 2024, 8:52 PM

                            @stephenw10

                            Aug 11 16:47:17 	kernel 		arpresolve: can't allocate llinfo for 192.168.1.254 on mvneta2
                            Aug 11 16:47:20 	php-fpm 	836 	/rc.newwanip: Removing static route for monitor [FIRST HOP] and adding a new route through [WAN GATEWAY]
                            Aug 11 16:47:21 	php-fpm 	836 	/rc.newwanip: Gateway, NONE AVAILABLE
                            Aug 11 16:47:22 	php-fpm 	836 	/rc.newwanip: Gateway, NONE AVAILABLE
                            Aug 11 16:47:22 	php-fpm 	836 	/rc.newwanip: IP Address has changed, killing states on former IP Address 0.0.0.0. 
                            

                            also

                            Aug 11 16:49:41 	php-fpm 	836 	/rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> [WAN IP] - Restarting packages. 
                            

                            does this look like the WAN went down and was recovered?

                            EDIT: extra info, the RG renews the DHCP lease for the pfsense appliance every 24 hours

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by stephenw10 Aug 13, 2024, 9:00 PM Aug 13, 2024, 9:00 PM

                              Does the DHCP log show anything for the dhclient at that time?

                              It should renew without any interruption but clearly it lost an IP entirely at one point.

                              P 1 Reply Last reply Aug 14, 2024, 12:03 AM Reply Quote 0
                              • P
                                papaMURKS @stephenw10
                                last edited by Aug 14, 2024, 12:03 AM

                                @stephenw10

                                well, my DHCP log is flooded with hundreds of dhcpd entries so the log only goes back to the last hour. Most are DHCPREQUESTs and DHCPACK for LAN devices and their MAC addresses. also dhcp lease renew and ipv6 advertise address entries.

                                there are no entries for dhclient

                                1 Reply Last reply Reply Quote 0
                                • S
                                  stephenw10 Netgate Administrator
                                  last edited by Aug 14, 2024, 2:47 PM

                                  You can filter that for the dhclient process:
                                  Screenshot from 2024-08-14 15-48-12.png

                                  P 1 Reply Last reply Aug 14, 2024, 5:22 PM Reply Quote 0
                                  • P
                                    papaMURKS @stephenw10
                                    last edited by papaMURKS Aug 14, 2024, 5:23 PM Aug 14, 2024, 5:22 PM

                                    @stephenw10 thanks! attached are the dhclient logs (forum flagged the pasted logs as spam...)
                                    dhcplogs.txt

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      stephenw10 Netgate Administrator
                                      last edited by Aug 14, 2024, 6:03 PM

                                      Hmm, well the only thing there is that at that point the logs show it pulled a private IP:

                                      Aug 11 16:45:02 	dhclient 	34520 	bound to 192.168.1.64 -- renewal in 15 seconds.
                                      

                                      That is usually a sign that the mode lost it's upstream connection and started handing out IPs itself. So if that did happen here that implies the line issues were reset by that upstream link reset/resync.

                                      P 1 Reply Last reply Aug 16, 2024, 5:44 PM Reply Quote 0
                                      • P
                                        papaMURKS @stephenw10
                                        last edited by papaMURKS Aug 16, 2024, 5:45 PM Aug 16, 2024, 5:44 PM

                                        @stephenw10 can you please clarify, do you mean it's a sign that the RG lost its upstream connection?

                                        and by line issues, do you mean att -> my house ONT, ONT -> RG, or RG -> pfsense?

                                        if the RG is handing out IPs itself, does that create a problem? (i believe it's possible for me to disable DHCP server in the RG if that could be the source of the issues...) it hasn't handed out any IPs except passing the WAN IP to pfsense.

                                        d60538e2-48c5-4f47-b033-1d1ae1ac6fbf-image.png

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          stephenw10 Netgate Administrator
                                          last edited by Aug 16, 2024, 5:54 PM

                                          I mean something upstream of the AT&T router/gateway. Those usually only hand out private IPs themselves when they can't connect to the upstream server.

                                          P 1 Reply Last reply Aug 16, 2024, 5:59 PM Reply Quote 0
                                          20 out of 29
                                          • First post
                                            20/29
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received