Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Issues with High Latency on PPPOE Reconnect

    Scheduled Pinned Locked Moved General pfSense Questions
    52 Posts 3 Posters 9.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      Flole
      last edited by

      Anyone has another idea? I really have no clue what else to try or how to debug. It all should work, but it doesn't. How could I try to track down the issue?

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        I notice you have that high CPU usage in state *if_la which I assume is truncated from if_lagg.

        What sort of lagg is that? What interfaces are in it?

        Can you remove the lagg entirely and put the VLANs over one interface directly as a test?

        It's weird though. I have a setup at home that's almost exactly like that. PPPoE connections on VLANs on an LACP LAGG.

        Steve

        1 Reply Last reply Reply Quote 0
        • F
          Flole
          last edited by Flole

          It is an LACP lagg with all interfaces in it (3 Intels and 1 Realtek), as I've mentioned before I've tried removing the Realtek from it before but no difference.

          I'll try adding them to the interface directly tomorrow, I'll have to find a time where downtime doesn't matter.

          1 Reply Last reply Reply Quote 0
          • F
            Flole
            last edited by

            I think I found the issue: It was something with the tuneables. It could have been the ip redirect thingy, but I've changed a few more before this reboot. Anyways, this is how it looks like know during a PPPoE reconnect (only lines with more than 1ms response time are shown, interval is 0.05ms):

            64 bytes from 192.168.1.1: icmp_seq=4223 ttl=64 time=79.2 ms
            64 bytes from 192.168.1.1: icmp_seq=4224 ttl=64 time=23.2 ms
            64 bytes from 192.168.1.1: icmp_seq=4225 ttl=64 time=126 ms
            64 bytes from 192.168.1.1: icmp_seq=4226 ttl=64 time=72.8 ms
            64 bytes from 192.168.1.1: icmp_seq=4227 ttl=64 time=16.8 ms
            64 bytes from 192.168.1.1: icmp_seq=4236 ttl=64 time=1.10 ms
            64 bytes from 192.168.1.1: icmp_seq=4255 ttl=64 time=65.5 ms
            64 bytes from 192.168.1.1: icmp_seq=4256 ttl=64 time=9.53 ms
            64 bytes from 192.168.1.1: icmp_seq=4257 ttl=64 time=121 ms
            64 bytes from 192.168.1.1: icmp_seq=4258 ttl=64 time=67.8 ms
            64 bytes from 192.168.1.1: icmp_seq=4259 ttl=64 time=12.3 ms
            64 bytes from 192.168.1.1: icmp_seq=4278 ttl=64 time=1.10 ms
            64 bytes from 192.168.1.1: icmp_seq=4363 ttl=64 time=2.16 ms
            64 bytes from 192.168.1.1: icmp_seq=4461 ttl=64 time=63.3 ms
            64 bytes from 192.168.1.1: icmp_seq=4462 ttl=64 time=7.35 ms
            64 bytes from 192.168.1.1: icmp_seq=4463 ttl=64 time=114 ms
            64 bytes from 192.168.1.1: icmp_seq=4464 ttl=64 time=60.7 ms
            64 bytes from 192.168.1.1: icmp_seq=4465 ttl=64 time=4.71 ms
            
            64 bytes from 192.168.1.1: icmp_seq=4496 ttl=64 time=79.2 ms
            64 bytes from 192.168.1.1: icmp_seq=4497 ttl=64 time=23.2 ms
            64 bytes from 192.168.1.1: icmp_seq=4498 ttl=64 time=125 ms
            64 bytes from 192.168.1.1: icmp_seq=4499 ttl=64 time=68.1 ms
            64 bytes from 192.168.1.1: icmp_seq=4500 ttl=64 time=12.1 ms
            64 bytes from 192.168.1.1: icmp_seq=4514 ttl=64 time=103 ms
            64 bytes from 192.168.1.1: icmp_seq=4515 ttl=64 time=47.6 ms
            64 bytes from 192.168.1.1: icmp_seq=4517 ttl=64 time=104 ms
            64 bytes from 192.168.1.1: icmp_seq=4518 ttl=64 time=48.7 ms
            

            This should be okay, right? Or is there more room for improvement?

            1 Reply Last reply Reply Quote 0
            • F
              Flole
              last edited by

              Actually that did not solve the issue..... I rebooted pfsense, added the Intel NIC I removed from the LACP earlier back to the bridge and it behaved just like it used to.

              Now I have removed em2 from the LAGG again and unplugged it, and I am getting those good times again and also the radvd problem is solved. Now of course I want to figure out the real source of this problem: The card causing the issues is em2, pciconf -lv says:

              em2@pci0:0:25:0:        class=0x020000 card=0x1494103c chip=0x15028086 rev=0x04 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = '82579LM Gigabit Network Connection (Lewisville)'
                  class      = network
                  subclass   = ethernet
              

              Anybody has any idea why this card is causing such massive problems in my setup?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                So you still have the other em NICX and the re NIC in that lagg when the ping latency looks good?
                Or is that with the lagg effectively removed?

                Steve

                1 Reply Last reply Reply Quote 0
                • F
                  Flole
                  last edited by

                  Yes I still have the LACP containing the other 2 em NICs and the re NIC, just em2 is now no longer connected to the LACP and i've pulled out the Ethernet cable aswell.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Curious. You could try enabling lacp debugging.
                    sysctl net.link.lagg.lacp.debug=1

                    Are those em NICs all the same type?

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • F
                      Flole
                      last edited by

                      No they are not all the same, em2 is a different one (it's the onboard NIC). The other ones are 82571EB.

                      Now that I know what causes the issue I will check if I still have one of those computers somewhere. I need a machine for testing all the stuff.

                      1 Reply Last reply Reply Quote 0
                      • F
                        Flole
                        last edited by

                        Could this patch solve the issue? https://svnweb.freebsd.org/base?view=revision&revision=336313

                        Looks like there are known Problems with that chipset causing some hangs, looks like exactly what I've experienced.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          TSO should be disabled by default in pfSense anyway in System > Advanced > Networking. But check the ifconfig output for the parent NICs to be sure.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • F
                            Flole
                            last edited by

                            I have it enabled, is it possible that thats what caused all the issues? I assumed that for Intel NICs that works fine. Do I understand it correctly that the patch is basically disabling the TSO, so there's no point in having it enabled on my installation?

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              It does appear to. I would definitely try that.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • F
                                Flole
                                last edited by

                                Unfortunately i can't post the ifconfig output as that constantly triggers the spam filter. On em2 there's nothing about tso, on em0 and em1 theres tso4. Also em2 is still unplugged, not sure if that matters.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  It may be correctly disabled on the NIC but passed through when it's added to the lagg group. Does lagg0 show TSO4?

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • F
                                    Flole
                                    last edited by

                                    I have tried globally disabling TSO, but that did not solve the issue. After a reboot the issue was back, pulling the cable did not help, only removing the interface from the LAGG.

                                    As this seems to be an issue with the specific Intel NIC, maybe someone can get one of these and look into the issue.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Are you seeing entries in the ppp log that look like this?:

                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] Link: reconnection attempt 24
                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] PPPoE: can't connect "[26]:"->"mpd98452-0" and "[17e]:"->"left": No such file or directory
                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] can't remove hook mpd98452-0 from node "[26]:": No such file or directory
                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] Link: DOWN event
                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] LCP: Down event
                                      Nov 21 16:46:35 pfsense ppp: [opt4_link0] Link: reconnection attempt 25 in 4 seconds
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] Link: reconnection attempt 33
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] PPPoE: can't connect "[1b]:"->"mpd17126-0" and "[4e]:"->"left": No such file or directory
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] can't remove hook mpd17126-0 from node "[1b]:": No such file or directory
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] Link: DOWN event
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] LCP: Down event
                                      Nov 21 16:46:35 pfsense ppp: [wan_link0] Link: reconnection attempt 34 in 2 seconds
                                      

                                      And entries in the System log like this:

                                      Nov 21 16:45:17 pfsense kernel: vlan0: changing name to 'lagg0.101'
                                      Nov 21 16:45:21 pfsense kernel: vlan1: changing name to 'lagg0.102'
                                      Nov 21 16:45:21 pfsense kernel: vlan2: changing name to 'lagg0.103'
                                      

                                      There are a number of open bugs that seem likely to be related if so. Specifically:
                                      https://redmine.pfsense.org/issues/9148

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • F
                                        Flole
                                        last edited by

                                        I've had that quite a few times in the logs, but also after taking out that problematic interface. Nothing thats specific to my problem here.

                                        I also had it quite a few times that the ppp interface just didn't want to reconnect, only a reboot helped in that case.

                                        And also I'm having some trouble with states staying during the reconnect and IP Change, but that's all nothing new and not part of the issue here.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          I think there may be some underlying issue here that is causing this. We are looking into it.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            You may want to try this patch:
                                            https://github.com/pfsense/pfsense/commit/433a8e71f3b68c39634e11b62d8bf3d9e8ec878c.patch

                                            You can apply that using the system patches package. It will be in 2.4.4p1 when that is released otherwise.

                                            It seems to have corrected all the issues I was seeing with PPPoE but they weren't identical to yours.

                                            Steve

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.