Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Buffer errors, packet loss, latency

    Scheduled Pinned Locked Moved 2.0-RC Snapshot Feedback and Problems - RETIRED
    21 Posts 6 Posters 7.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      clarknova
      last edited by

      4062 /4865 currently, but always growing (since August). I've seen them get up to 25,000+ before the whole think tanks (I don't know if it was freeze, panic or other at the time). I have a cron job to record the output of netstat -m to a file hourly, if you think it might be informative.

      db

      1 Reply Last reply Reply Quote 0
      • E
        eri--
        last edited by

        Yes please and show even what services are you runing on the box.
        Also type of nics.

        1 Reply Last reply Reply Quote 0
        • C
          clarknova
          last edited by

          Intel 82574L Gigabit Ethernet x2 (em)
          Unbound and Cron packages installed
          Status: Services shows:

          cron 	The cron utility is used to manage commands on a schedule. 	
          
          dhcpd 	DHCP Service 	
          
          ntpd 	NTP clock sync 	
          
          openvpn 	OpenVPN client: 	
          
          unbound 	Unbound is a validating, recursive, and caching DNS resolver.
          

          all running. Does that answer your question about which services I'm running?

          I cleared the attached log when I updated to the RC 1 day 21 hours ago.

          netstat-m.log.txt

          db

          1 Reply Last reply Reply Quote 0
          • C
            clarknova
            last edited by

            I'm now thinking this doesn't have anything to do with uptime. Although a reboot appeared to fix the latency problems in the past, I think it may have been chance coincidence, because i normally do my updates at off-peak times.

            Could it be something in the way the traffic shaper works? That's what it looks like to me, as if high-priority traffic isn't getting prioritized at all.

            The first screen shot shows the Quality graph. When things are working well it stays in the 40-60 range. The second graph is throughput during the same period. Note that Quality is poor even when the upload rate is less than maximum.

            You can see in the throughput graph that I lowered the WAN parent queue from 4000 kbit to 2500 kbit in an attempt to restore quality, but it appears to have had no such effect.

            latency.png
            latency.png_thumb
            throughput.png
            throughput.png_thumb

            db

            1 Reply Last reply Reply Quote 0
            • K
              Kevin
              last edited by

              Sound like about the same issue I am having.  Probably should merge our threads. My box is also using the em driver.  I think that is where the problem lies.  I have zero packages and no traffic shaping.  It is running mostly defaults.

              1 Reply Last reply Reply Quote 0
              • C
                clarknova
                last edited by

                @Kevin:

                Sound like about the same issue I am having.  Probably should merge our threads. My box is also using the em driver.  I think that is where the problem lies.  I have zero packages and no traffic shaping.  It is running mostly defaults.

                I just looked at your thread and I have had very similar symptoms, reported here: http://forum.pfsense.org/index.php/topic,32897.0.html

                I suspect my issues are all related, and they appear to be related to yours too. I'm using the SM X7SPA-H, incidentally.

                Further examination of my packet loss symptoms, it appears that the packet loss and latency problems appear when my uplink is running at max. That would be normal with no QoS in place, but it appears that regardless of where I limit my WAN parent queue, as soon as packets start to drop, they drop from queues that shouldn't be full or dropping, such as icmp and voip, which never run up to their full allocation.

                db

                1 Reply Last reply Reply Quote 0
                • K
                  Kevin
                  last edited by

                  Mine are X7SPE-HF using dual 82574L Intel NICs.  Right now I only have 2 VoIP phones connected registered to an external server.  They lose registration after only a few minutes with no traffic at all.

                  1 Reply Last reply Reply Quote 0
                  • C
                    clarknova
                    last edited by

                    I just realized my traffic shaper is failing to classify a lot of the traffic it used to. So bulk traffic is squeezing out interactive traffic, as they all land in the default queue for reasons unknown to me.

                    db

                    1 Reply Last reply Reply Quote 0
                    • K
                      Kevin
                      last edited by

                      Still having the same issue on the latest RC1 March 2.

                      Is there any information i can send to try and resolve this.

                      Passes traffic for a few minutes then quits.  Best I can tell only the NIC stops

                      1 Reply Last reply Reply Quote 0
                      • S
                        sullrich
                        last edited by

                        We think there is an mbuf leak in the Intel nic code.  We used the Yandex drivers in 1.2.3 and now I am starting to wish we did the same for RC1.  Looks like we will be importing the Yandex drivers soon so please keep an eye peeled on the snapshot server.  Hopefully by tomorrow.

                        1 Reply Last reply Reply Quote 0
                        • S
                          sullrich
                          last edited by

                          @Kevin:  Since you can reproduce this so quickly please email me at sullrich@gmail.com and I will work with you as soon as we have a new version available.

                          1 Reply Last reply Reply Quote 0
                          • C
                            clarknova
                            last edited by

                            @sullrich:

                            We think there is an mbuf leak in the Intel nic code.   We used the Yandex drivers in 1.2.3 and now I am starting to wish we did the same for RC1.   Looks like we will be importing the Yandex drivers soon so please keep an eye peeled on the snapshot server.   Hopefully by tomorrow.

                            Great news. Thanks for the update. Please let me know if I can do any testing or provide any info to help.

                            db

                            1 Reply Last reply Reply Quote 0
                            • C
                              clarknova
                              last edited by

                              2.0-RC1 (amd64)
                              built on Thu Mar 3 19:27:51 EST 2011

                              Although I am thoroughly pleased with the new Yandex driver, I'm still seeing what looks like an mbuf leak. This is from a file that records 'netstat -m' every 4 hours, since my last upgrade.

                              [2.0-RC1]:grep "mbuf clusters" netstat-m.log
                              8309/657/8966/131072 mbuf clusters in use (current/cache/total/max)
                              8389/577/8966/131072 mbuf clusters in use (current/cache/total/max)
                              8484/610/9094/131072 mbuf clusters in use (current/cache/total/max)
                              8630/720/9350/131072 mbuf clusters in use (current/cache/total/max)
                              8815/663/9478/131072 mbuf clusters in use (current/cache/total/max)
                              8958/744/9702/131072 mbuf clusters in use (current/cache/total/max)
                              9055/775/9830/131072 mbuf clusters in use (current/cache/total/max)
                              9086/744/9830/131072 mbuf clusters in use (current/cache/total/max)
                              9192/766/9958/131072 mbuf clusters in use (current/cache/total/max)
                              9331/771/10102/131072 mbuf clusters in use (current/cache/total/max)
                              9627/731/10358/131072 mbuf clusters in use (current/cache/total/max)
                              9873/757/10630/131072 mbuf clusters in use (current/cache/total/max)
                              
                              

                              db

                              1 Reply Last reply Reply Quote 0
                              • K
                                Kevin
                                last edited by

                                I am still seeing the same issues as of the March 15 snapshot.  Traffic stops passing after a short time.  I will upgrade again tomorrow.

                                Any more insight or ideas on what is happening.  The box is still connected via serial port to a PC Ermal has remote access to.

                                1 Reply Last reply Reply Quote 0
                                • M
                                  mdima
                                  last edited by

                                  Hello,
                                  I am using intel nics on the x64 rc1 snapshots (2 x Intel PRO/1000 MT Dual Port Server Adapter + 1 Intel PRO/1000 on the motherboard) and I am seeing my MBUF growing every time… I confirm... for example now on the dashboard I read:
                                  mbuf usage: 5153 /6657

                                  even if the firewall has only 76 states active... but the number/max is growing every hour... even if I don't have any problem related to this, traffic passes with no problems...

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    clarknova
                                    last edited by

                                    @mdima:

                                    even if I don't have any problem related to this, traffic passes with no problems…

                                    The problem comes when the mbufs in use (the numbers you see) run into the max (which you don't see, unless you run 'netstat -m' from the shell), at which point everything stops rather precipitously.

                                    db

                                    1 Reply Last reply Reply Quote 0
                                    • M
                                      mdima
                                      last edited by

                                      @clarknova:

                                      @mdima:

                                      even if I don't have any problem related to this, traffic passes with no problems…

                                      The problem comes when the mbufs in use (the numbers you see) run into the max (which you don't see, unless you run 'netstat -m' from the shell), at which point everything stops rather precipitously.

                                      I know, I don't have problems at all, just I see that mbuf is growing constantly and that its values are very high if I compare them with another firewall I am using (x86 RC1 with 3Com nics, which has values about 200-300 mbuf sizes, max 1200).

                                      On my x64 RC1 my netstat-m reports:

                                      5147/1510/6657 mbufs in use (current/cache/total)
                                      5124/1270/6394/25600 mbuf clusters in use (current/cache/total/max)

                                      I don't know if it's normal or not, I just confirm that with intel nics on x64 this values seems to be growing constantly…

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        clarknova
                                        last edited by

                                        @mdima:

                                        5124/1270/6394/25600 mbuf clusters in use (current/cache/total/max)

                                        I think when that 6394 hits 25600 you will see a panic. You can bump up that 25600 value by putting

                                        
                                        kern.ipc.nmbclusters="131072"
                                        
                                        

                                        (or the value of your choice) into /boot/loader.conf.local and reboot. This uses more RAM, but not a lot more, and buys you time between reboots.

                                        db

                                        1 Reply Last reply Reply Quote 0
                                        • M
                                          mdima
                                          last edited by

                                          
                                          kern.ipc.nmbclusters="131072"
                                          
                                          

                                          Thanks, I put this setting in the "system->advanced->system tunables", now the netstat -m shows 131072 as max value…
                                          Anyway, hope this problem will be solved, because from what I understood, if there's a buffer leak problem every value you set will be reached sooner or later...

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.