Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Issues with High Latency on PPPOE Reconnect

    Scheduled Pinned Locked Moved General pfSense Questions
    52 Posts 3 Posters 9.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Do you have State Killing on Gateway Failure set in System > Advanced > Miscellaneous?

      Steve

      F 1 Reply Last reply Reply Quote 0
      • F
        Flole
        last edited by

        Yes the PPPOE is also connected over a VLAN. It works great, just whenever I do a reconnect, shortly after the connect I am getting those packet drops. Is there any Info I can provide to help you trace the issue? In top I can see dpinger using 100% CPU at that time (refreshing doesnt work, but thats what I can see once it starts to refresh again).

        1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          If you kill dpinger and then edit/save/apply the gateway does it come back OK?

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          F 1 Reply Last reply Reply Quote 0
          • F
            Flole @stephenw10
            last edited by

            @stephenw10 No, that option is not set. When the states are killed, the SSH Sessions would also drop, and not continue to function afterwards.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Try disabling gateway monitoring for the PPPoE gateway and retesting.

              Steve

              1 Reply Last reply Reply Quote 0
              • F
                Flole @jimp
                last edited by

                @jimp Theres no need to kill it, just wait for a few seconds and its all back to normal like nothing happened.

                I am getting an email with this:

                15:56:33 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:33 10292MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 45534MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 53386MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 10292MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|218.804ms|634.286ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 64937MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 64937MONITOR: GW_WAN is available now, adding to routing group KDG
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 45534MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:36 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 10292MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 53386MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 53386MONITOR: GW_WAN is available now, adding to routing group KDG
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 MONITOR: GW_WAN has high latency, omitting from routing group Main
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                15:56:37 64937MONITOR: GW_WAN is available now, adding to routing group Failover
                8.8.8.8|x.x.x.x|GW_WAN|204.818ms|611.325ms|0.0%|delay
                

                Also Network UPS Tools complained about the UPS being unreachable, and a few seconds later it says that its connected again. Pings to the firewall get also lost for a few seconds, and after some time it's all back to normal.

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  Killing and restarting dpinger was more about the 100% CPU issue and not the latency. That may be closer to the root cause.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • F
                    Flole
                    last edited by

                    Dpinger is going back to normal CPU Usage after a few seconds. Also Avahi is using lots of CPU on a reconnect when its enabled.

                    I disabled the Gateway monitoring and also marked it as permanent down. Dpinger still had a high CPU, as well as Avahi. So i disabled avahi aswell, now I got a high CPU Usage from dhcpd and route for a second during the reconnect, and the issue is still there.

                    1 Reply Last reply Reply Quote 0
                    • F
                      Flole
                      last edited by

                      I just noticed that with avahi disabled it gets a little better. My normal püing to pfsense is 0.2, on a PPPOE reconnect with avahi disabled I am getting 2 pings with 60-100ms. I need avahi to do printer sharing betweem the Server in the DMZ and the printers on the LAN though.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        What hardware are you running this on?

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • F
                          Flole
                          last edited by

                          pciconf -l | grep class=0x020000
                          em2@pci0:0:25:0:        class=0x020000 card=0x1494103c chip=0x15028086 rev=0x04 hdr=0x00
                          em0@pci0:1:0:0: class=0x020000 card=0x7044103c chip=0x105e8086 rev=0x06 hdr=0x00
                          em1@pci0:1:0:1: class=0x020000 card=0x7044103c chip=0x105e8086 rev=0x06 hdr=0x00
                          re0@pci0:3:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x06 hdr=0x00
                          

                          I also tried unplugging that Realtek PCI Express Card from the Switch so it's no longer an active slave in the bonding, but that didn't help. One of the Intel NICs is an onboard NIC, the other 2 a Dual NIC PCI Express card. Please let me know if thats enough information or if I have to open the machine and get the exact model number from the PCI card.

                          1 Reply Last reply Reply Quote 0
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            Do you have any Advanced Configuration options active on the DHCP WAN? Maybe that's the one causing the issue, not PPPoE. See https://redmine.pfsense.org/issues/8507#note-23

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • F
                              Flole
                              last edited by

                              I have tried disabling IPv6 on the PPPOE Link, so that it's no longer using DHCP but that did not help. The IPv4 is aquired over PPPOE, there's nothing else I can configure in there.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Ah, OK so you don't actually have two WAN connections just two gateways, one v4 and one v6?

                                If that's the case you would expect the dhcp v6 gateway to go down when the parent PPPoE link reconnects.

                                Earlier I meant what CPU and RAM etc are you running. I was hoping to judge what sort of CPU usage might be expected there.

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • F
                                  Flole
                                  last edited by

                                  No I do have 2 Internet connections with 2 Gateways each. I'll call them WAN and DSL from now on to make clear which one I mean.

                                  On WAN I have a static IPv4 Configured and I am using DHCP for IPv6.

                                  On DSL I use PPPOE for IPv4 Configuration and DHCP for IPv6.

                                  The CPU is an Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz with 16GB of RAM

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Ah OK.
                                    Hmm, I'm not sure if that would apply at all to v6, I don't think it does.

                                    That hardware is clearly more than capable of handling WAN events without a problem. Or should be least.

                                    Disabling gateway monitoring and/or disabling gateway monitor action on each of those could be telling.

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • F
                                      Flole
                                      last edited by

                                      I've disabled the monitoring action on all Gateways. With Avahi and snort disabled it looks like this after PPPOE is connected:

                                      64 bytes from 192.168.1.1: icmp_seq=715 ttl=64 time=0.225 ms
                                      64 bytes from 192.168.1.1: icmp_seq=716 ttl=64 time=0.236 ms
                                      64 bytes from 192.168.1.1: icmp_seq=717 ttl=64 time=1996 ms
                                      64 bytes from 192.168.1.1: icmp_seq=718 ttl=64 time=973 ms
                                      64 bytes from 192.168.1.1: icmp_seq=719 ttl=64 time=2895 ms
                                      64 bytes from 192.168.1.1: icmp_seq=720 ttl=64 time=1879 ms
                                      64 bytes from 192.168.1.1: icmp_seq=721 ttl=64 time=855 ms
                                      64 bytes from 192.168.1.1: icmp_seq=722 ttl=64 time=32.5 ms
                                      64 bytes from 192.168.1.1: icmp_seq=723 ttl=64 time=0.297 ms
                                      64 bytes from 192.168.1.1: icmp_seq=724 ttl=64 time=0.213 ms
                                      64 bytes from 192.168.1.1: icmp_seq=725 ttl=64 time=0.166 ms
                                      64 bytes from 192.168.1.1: icmp_seq=726 ttl=64 time=2758 ms
                                      64 bytes from 192.168.1.1: icmp_seq=727 ttl=64 time=1734 ms
                                      64 bytes from 192.168.1.1: icmp_seq=728 ttl=64 time=710 ms
                                      64 bytes from 192.168.1.1: icmp_seq=729 ttl=64 time=772 ms
                                      64 bytes from 192.168.1.1: icmp_seq=730 ttl=64 time=1637 ms
                                      64 bytes from 192.168.1.1: icmp_seq=731 ttl=64 time=628 ms
                                      64 bytes from 192.168.1.1: icmp_seq=732 ttl=64 time=0.299 ms
                                      64 bytes from 192.168.1.1: icmp_seq=733 ttl=64 time=0.310 ms
                                      64 bytes from 192.168.1.1: icmp_seq=734 ttl=64 time=0.339 ms
                                      

                                      The following lines all have a latency <1ms.

                                      With avahi disabled and snort enabled it looks like this:

                                      64 bytes from 192.168.1.1: icmp_seq=30 ttl=64 time=0.337 ms
                                      64 bytes from 192.168.1.1: icmp_seq=31 ttl=64 time=2198 ms
                                      64 bytes from 192.168.1.1: icmp_seq=32 ttl=64 time=1174 ms
                                      64 bytes from 192.168.1.1: icmp_seq=33 ttl=64 time=151 ms
                                      64 bytes from 192.168.1.1: icmp_seq=34 ttl=64 time=443 ms
                                      64 bytes from 192.168.1.1: icmp_seq=35 ttl=64 time=2097 ms
                                      64 bytes from 192.168.1.1: icmp_seq=36 ttl=64 time=1090 ms
                                      64 bytes from 192.168.1.1: icmp_seq=37 ttl=64 time=66.9 ms
                                      64 bytes from 192.168.1.1: icmp_seq=38 ttl=64 time=0.310 ms
                                      64 bytes from 192.168.1.1: icmp_seq=39 ttl=64 time=0.305 ms
                                      64 bytes from 192.168.1.1: icmp_seq=40 ttl=64 time=2061 ms
                                      64 bytes from 192.168.1.1: icmp_seq=41 ttl=64 time=1037 ms
                                      64 bytes from 192.168.1.1: icmp_seq=42 ttl=64 time=13.8 ms
                                      64 bytes from 192.168.1.1: icmp_seq=43 ttl=64 time=91.6 ms
                                      64 bytes from 192.168.1.1: icmp_seq=44 ttl=64 time=0.270 ms
                                      64 bytes from 192.168.1.1: icmp_seq=45 ttl=64 time=386 ms
                                      64 bytes from 192.168.1.1: icmp_seq=46 ttl=64 time=0.260 ms
                                      

                                      The following lines all have a latency <1ms.

                                      It seems like neither one has an effect on this. I would rather assume that this is somehow related to the IPv6 Latency spikes I am experiencing when using DHCPv6 RA functionality, see this thread for the issue description of that. Is there any way of troubleshooting like for example seeing what kernel module uses how much CPU time or something similar?

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        If you have top -aSH open in the console you should see what processes are using CPU cycles.
                                        Is it just dpinger?
                                        If you actually disable monitoring for all gateways dpinger shouldn't be running at all so that would reveal anything else.

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • F
                                          Flole
                                          last edited by

                                          Thats interesting..... I re-enabled dpinger as I think it's clearly not the cause.

                                          PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
                                             12 root       -92    -     0K   336K CPU2    2  57:12  66.62% [intr{irq268: re0}]
                                             12 root       -92    -     0K   336K RUN     1   5:40  62.03% [intr{swi4: clock (0)}]
                                              0 root       -92    -     0K   480K *in6_m  0  28:45  50.42% [kernel{em2 que}]
                                             11 root       155 ki31     0K    64K RUN     2  23.5H  27.08% [idle{idle: cpu2}]
                                          27976 root        34    0  6900K  2464K nanslp  2   0:01  21.58% [dpinger{dpinger}]
                                             11 root       155 ki31     0K    64K RUN     0  24.3H  18.74% [idle{idle: cpu0}]
                                              0 root       -92    -     0K   480K CPU0    0  21:12  18.15% [kernel{em1 que}]
                                          29112 root        33    0  6900K  2508K nanslp  2   0:01  17.51% [dpinger{dpinger}]
                                             11 root       155 ki31     0K    64K RUN     1  24.5H  16.80% [idle{idle: cpu1}]
                                          

                                          Is the realtek Interface the one causing the issues? Right now it is connected as part of the LAGG, would disconnecting help, or removing it from the LAGG or can just removing it physically from the machine show if its the cause?

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Well if anything is it's probably the Realtek. You should be able to just remove it from the LAGG unless it's causing some terrrible PCI bus issue.

                                            What throughput are you seeing there? Those numbers look high for a 3.3GHz i3.

                                            Steve

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.