Buffer errors, packet loss, latency

Kevin

Mine are X7SPE-HF using dual 82574L Intel NICs. Right now I only have 2 VoIP phones connected registered to an external server. They lose registration after only a few minutes with no traffic at all.

clarknova

I just realized my traffic shaper is failing to classify a lot of the traffic it used to. So bulk traffic is squeezing out interactive traffic, as they all land in the default queue for reasons unknown to me.

Kevin

Still having the same issue on the latest RC1 March 2.

Is there any information i can send to try and resolve this.

Passes traffic for a few minutes then quits. Best I can tell only the NIC stops

sullrich

We think there is an mbuf leak in the Intel nic code. We used the Yandex drivers in 1.2.3 and now I am starting to wish we did the same for RC1. Looks like we will be importing the Yandex drivers soon so please keep an eye peeled on the snapshot server. Hopefully by tomorrow.

sullrich

@Kevin: Since you can reproduce this so quickly please email me at sullrich@gmail.com and I will work with you as soon as we have a new version available.

clarknova

@sullrich:

We think there is an mbuf leak in the Intel nic code. We used the Yandex drivers in 1.2.3 and now I am starting to wish we did the same for RC1. Looks like we will be importing the Yandex drivers soon so please keep an eye peeled on the snapshot server. Hopefully by tomorrow.

Great news. Thanks for the update. Please let me know if I can do any testing or provide any info to help.

clarknova

2.0-RC1 (amd64)
built on Thu Mar 3 19:27:51 EST 2011

Although I am thoroughly pleased with the new Yandex driver, I'm still seeing what looks like an mbuf leak. This is from a file that records 'netstat -m' every 4 hours, since my last upgrade.

[2.0-RC1]:grep "mbuf clusters" netstat-m.log
8309/657/8966/131072 mbuf clusters in use (current/cache/total/max)
8389/577/8966/131072 mbuf clusters in use (current/cache/total/max)
8484/610/9094/131072 mbuf clusters in use (current/cache/total/max)
8630/720/9350/131072 mbuf clusters in use (current/cache/total/max)
8815/663/9478/131072 mbuf clusters in use (current/cache/total/max)
8958/744/9702/131072 mbuf clusters in use (current/cache/total/max)
9055/775/9830/131072 mbuf clusters in use (current/cache/total/max)
9086/744/9830/131072 mbuf clusters in use (current/cache/total/max)
9192/766/9958/131072 mbuf clusters in use (current/cache/total/max)
9331/771/10102/131072 mbuf clusters in use (current/cache/total/max)
9627/731/10358/131072 mbuf clusters in use (current/cache/total/max)
9873/757/10630/131072 mbuf clusters in use (current/cache/total/max)

Kevin

I am still seeing the same issues as of the March 15 snapshot. Traffic stops passing after a short time. I will upgrade again tomorrow.

Any more insight or ideas on what is happening. The box is still connected via serial port to a PC Ermal has remote access to.

mdima

Hello,
I am using intel nics on the x64 rc1 snapshots (2 x Intel PRO/1000 MT Dual Port Server Adapter + 1 Intel PRO/1000 on the motherboard) and I am seeing my MBUF growing every time… I confirm... for example now on the dashboard I read:
mbuf usage: 5153 /6657

even if the firewall has only 76 states active... but the number/max is growing every hour... even if I don't have any problem related to this, traffic passes with no problems...

clarknova

@mdima:

even if I don't have any problem related to this, traffic passes with no problems…

The problem comes when the mbufs in use (the numbers you see) run into the max (which you don't see, unless you run 'netstat -m' from the shell), at which point everything stops rather precipitously.

mdima

@clarknova:

@mdima:

even if I don't have any problem related to this, traffic passes with no problems…

The problem comes when the mbufs in use (the numbers you see) run into the max (which you don't see, unless you run 'netstat -m' from the shell), at which point everything stops rather precipitously.

I know, I don't have problems at all, just I see that mbuf is growing constantly and that its values are very high if I compare them with another firewall I am using (x86 RC1 with 3Com nics, which has values about 200-300 mbuf sizes, max 1200).

On my x64 RC1 my netstat-m reports:

5147/1510/6657 mbufs in use (current/cache/total)
5124/1270/6394/25600 mbuf clusters in use (current/cache/total/max)

I don't know if it's normal or not, I just confirm that with intel nics on x64 this values seems to be growing constantly…

clarknova

@mdima:

5124/1270/6394/25600 mbuf clusters in use (current/cache/total/max)

I think when that 6394 hits 25600 you will see a panic. You can bump up that 25600 value by putting


kern.ipc.nmbclusters="131072"

(or the value of your choice) into /boot/loader.conf.local and reboot. This uses more RAM, but not a lot more, and buys you time between reboots.

mdima


kern.ipc.nmbclusters="131072"

Thanks, I put this setting in the "system->advanced->system tunables", now the netstat -m shows 131072 as max value…
Anyway, hope this problem will be solved, because from what I understood, if there's a buffer leak problem every value you set will be reached sooner or later...