Device Pooling and Interrupts for Intel Pro NIC

PeterZ

I thought I also post some stats with pooling enabled for comparison:

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
64070 5632 71219252 54265 0 10423132 0
64021 5725 71218808 53515 0 9782787 0
64250 5802 70964594 53940 0 10376166 0
64265 5456 71149400 53923 0 10326147 0
64199 5632 70746138 53750 0 10332519 0
63959 6994 70109209 54463 0 10769424 0
64659 6472 72257534 54529 0 10901799 0

vmstat 5

procs memory page disk faults cpu
r b w avm fre flt re pi po fr sr ad2 in sy cs us sy id
0 3 1 103652 429576 140 0 0 0 135 0 0 5283 1691 7886 1 69 30
0 3 0 103652 429576 17 0 0 0 18 0 0 2795 780 3101 1 32 68
0 3 0 103656 429572 17 0 0 1 17 0 3 2805 629 3056 0 34 66
0 3 0 103660 429568 17 0 0 0 18 0 4 2819 661 3102 1 33 66

So if pooling is enabled the packet loss becomes massive, CPU usage though drops dramatically.
Interesting enough however:

sysctl -a | grep burst

kern.polling.burst: 7
kern.polling.burst_max: 150

So it does not looks like any significant number of packets are fetched per each pool interval.

sullrich

Interrupt levels look okay. Wonder if you are running out of PCI bandwidth?

PeterZ

@sullrich:

Interrupt levels look okay. Wonder if you are running out of PCI bandwidth?

It is just 120-150Mbit which is not a lot. The specs for this network card have much larger capacity mentioned.

But anyway it looks like I see why the packet loss is happening. The packet loss are happening each 10 seconds, which seems to correspond to states purge interval for pf filter

My current theory some lock during this process blocks fetching packets from card buffer which cause packet lost.

I have about 70.000 of states active.

Now I'm looking at a way to configure it so it works better - can it be configured so not all hash table is purged at once but purge done by portions (which would reduce lock time)

Otherwise may be I can increase some kernel level buffers so packets do not have to stay in network card buffer while purge process is active.

sullrich

Let me ask Bill. He has the most experience with these large installations.

sullrich

Please try this and see if it helps.

Edit /tmp/rules.debug and add this to the top:

set timeout interval 1
set timeout { tcp.finwait 10, tcp.closed 5 }

Now run this from a shell: pfctl -f /tmp/rules.debug

Please test with these settings and let me know if it is better. We might have to make this a hidden variable. Or something.

PeterZ

This does not help but it does change things:

netstat -I em1 -w 1

input (em1) output
packets errs bytes packets errs bytes colls
9995 249 12152731 6990 0 1242327 0
10572 227 12847133 7346 0 1259203 0
10748 359 13196148 7666 0 1305645 0
11388 310 14000169 8052 0 1397665 0
11842 259 14595778 8389 0 1464044 0

So now we have same packet loss each second instead of once per 10 seconds.

It looks to be like purging happening for whole table rather than in pieces.

Changing it to do purging once per 60 seconds gives this:

124838 0 156441927 84043 0 12043484 0
117116 897 146381898 80182 0 11669285 0
116284 0 144356067 80062 0 11898673 0

(10 second increments)

Once per 60 seconds so more packets are lost in spikes but fewer in average.

Interesting - is not there some kernel buffer one can increase to avoid this problem.
It looks very strange to me purging states table blocks network device buffer from processing.

@sullrich:

Please try this and see if it helps.

Edit /tmp/rules.debug and add this to the top:

set timeout interval 1
set timeout { tcp.finwait 10, tcp.closed 5 }

Now run this from a shell: pfctl -f /tmp/rules.debug

Please test with these settings and let me know if it is better. We might have to make this a hidden variable. Or something.

sullrich

You did test this with polling off, right?

PeterZ

@sullrich:

You did test this with polling off, right?

Sure. The pooling is off and I'm not even trying to turn it on because it becomes so much worse…

sullrich

Okay, I will run this by Bill and I have emailed Max Laier who might have some tuning advice for us.

Just for the record what kind of bandwidth are you pushing? Can you share RRD bandwidth and packet graphs?

EDITED: Spelling mistakes.

billm

Can you send us the output of:

sysctl net.inet.ip.intr_queue_drops

And maybe increase net.inet.ip.intr_queue_maxlen

sysctl net.inet.ip.intr_queue_maxlen=250

And let me know if that helps.

Also…send the output of:

sysctl net.isr

Thanks

–Bill

sullrich

Yes, please provide the outputs that Bill is requesting.

Once you have outputted that and tried upping the sysctl if we still have not made any progress I have received a patch from Max that might help. If we get to this point I will compile a custom test kernel for you.

PeterZ

@sullrich:

Yes, please provide the outputs that Bill is requesting.

Once you have outputted that and tried upping the sysctl if we still have not made any progress I have received a patch from Max that might help. If we get to this point I will compile a custom test kernel for you.

sysctl net.isr

net.isr.direct: 0
net.isr.count: 444903
net.isr.directed: 0
net.isr.deferred: 444903
net.isr.queued: 321
net.isr.drop: 0
net.isr.swi_count: 411596

sysctl net.inet.ip.intr_queue_drops

net.inet.ip.intr_queue_drops: 0

After queue was increased to 250:

netstat -I em1 -w 100

input (em1) output
packets errs bytes packets errs bytes colls
638808 312 765089702 450895 0 70931893 0
668226 951 808850726 458403 0 63521833 0

So we're still loosing packets, the number is less but the traffic also dropped 30 from day high so the problem did not went away.

sullrich

From a shell, run this (testing a new kernel):

Backup old kernel

cp /boot/kernel/kernel.gz ~root/
fetch -o /boot/kernel/kernel.gz http://www.pfsense.com/~sullrich/kernel.gz
shutdown -r now

This will reboot your box, be prepared.

PeterZ

Thanks Scott,

I tried this kernel and unfortunately with it system crashes within 2 to 15 minutes from the boot.
There is still packet loss

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
125527 73 145482529 89756 0 8569116 0
125283 0 144453737 90196 0 8797933 0
114783 0 128255459 82857 0 8698382 0
112750 109 125654284 81726 0 8263655 0
112577 0 125250797 81648 0 7740500 0
113482 0 124736359 81646 0 7689588 0
110745 0 121549353 80587 0 8305343 0
111846 39 121826640 81538 0 8449537 0

This is with default 10 sec purge interval.

By the way this box is running pfsense 1.0.1 - would you recommend upgrading to 1.2.0RC1 - is it something safe enough to do for remote box ?

Also I think it would be very handy to add a feature to enter advanced pf settings somewhere so they are kept in config.
Changing purge interval and other advanced settings is very inconvenient now as rules.debug are always recreated from scratch.

@sullrich:

From a shell, run this (testing a new kernel):

Backup old kernel

cp /boot/kernel/kernel.gz ~root/
fetch -o /boot/kernel/kernel.gz http://www.pfsense.com/~sullrich/kernel.gz
shutdown -r now

This will reboot your box, be prepared.

PeterZ

I thought I would also post data from the old kernel (6.1-RELEASE-p10) for comparison, right after the box reboot:

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
119425 227 136364409 85812 0 10439358 0
121176 316 137712815 87569 0 10499359 0
122182 348 140802834 87724 0 10258430 0
124622 468 144679509 88983 0 10354789 0
131813 427 152227016 94048 0 10657670 0
131419 449 151380609 93904 0 11080772 0
129773 457 149226974 91306 0 10959052 0

So the new kernel seems to be loosing less packets but still looses some and crashes.

sullrich

Yes, please update to 1.2-RC1.

http://wiki.pfsense.com/wikka.php?wakka=UsingThePHPpfSenseShell

Perry

So how did this story end?

billm

Haven't heard anything else. But for what it's worth, I'm not seeing this in FreeBSD 6.2 w/ a couple of the pfSense patches added to our kernel.


# netstat -w 10
            input        (Total)           output
   packets  errs      bytes    packets  errs      bytes colls
    868544     0  286227325     865170     0  251535256     0 
    770774     0  222347441     796016     0  225886191     0 
    731287     0  224789308     766395     0  231316740     0 
    767101     0  234638730     798607     0  244245061     0 
    828549     0  245917253     847273     0  242236942     0 
    782814     0  235875581     809549     0  238561715     0 
    743229     0  222066030     776047     0  239061961     0

And the pci-id's of the cards

dual port fiber card
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection Version - 6.2.9
dev.em.0.%driver: em
dev.em.0.%location: slot=1 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x1012 subvendor=0x8086 subdevice=0x1012 class=0x020000

dual port copper card
dev.em.2.%desc: Intel(R) PRO/1000 Network Connection Version - 6.2.9
dev.em.2.%driver: em
dev.em.2.%location: slot=1 function=0
dev.em.2.%pnpinfo: vendor=0x8086 device=0x1079 subvendor=0x8086 subdevice=0x1179 class=0x020000

–Bill