Device Pooling and Interrupts for Intel Pro NIC

PeterZ

I've been using PfSense for couple of years and it works just great…
But now I'm having some traffic related performance problems and I'm
looking for tuning advice.

A while back I set our system to "pooling" method to reduce CPU usage
and it worked fine.

Today I noticed some pages were loading rather slow and I saw outgoing
traffic stuck about at 60Mbps

The log was full of errors:

em0: RX overrun
em0: RX overrun

If I disable pooling the traffic goes to 140Mbps but Interrupt time
spikes to 90% which is another problem.

The system is using Intel Pro/1000 Nicks (It is Dell PowerEdge 850)

The ideas I have are:

Any way I can increase the buffers so they can fit enough data so
pooling will work ?
Any way I can increase pooling interrupt rate ?
Any way I can make it less CPU consuming in Interrupt mode ? May be
NIC can be set to generate interrupts when buffers are more full ?

I would appreciate any advice

Perry

Maybe it's time for new hardware :D
http://pfsense.blogspot.com/2007/06/polling-and-freebsd.html

Else your could play around with jumbo frames, setting your mtu to 9000 if you haven't already done so.
I've only done some test with netio so a report from a live system would be nice :p

PeterZ

@Perry:

Maybe it's time for new hardware :D
http://pfsense.blogspot.com/2007/06/polling-and-freebsd.html

Else your could play around with jumbo frames, setting your mtu to 9000 if you haven't already done so.
I've only done some test with netio so a report from a live system would be nice :p

Well It is just about 10K packets per second processed which is not that much - the box has Celeron 2.4Ghz

I've seen posts from people having 10 times throughput on FreeBSD firewall, probably on better hardware but I do not think it is that much better.

Also do you know if time spent doing packet filtering is also considered "interrupt" time and if yes are there any ways one can make this process less CPU consuming ?

sullrich

Please post the output of vmstat -i from a shell prompt. Let's get a look at the interrupt levels.

PeterZ

@sullrich:

Please post the output of vmstat -i from a shell prompt. Let's get a look at the interrupt levels.

Sure. Here is vmstat -i

vmstat -i

interrupt total rate
irq13: npx0 1 0
irq14: ata0 179 0
irq15: ata1 83421 1
irq18: em0 222774882 2811
irq21: em1 201538485 2543
cpu0: timer 158461904 2000
Total 582858872 7356

Here how it translates to CPU load:

vmstat 5

procs memory page disk faults cpu
r b w avm fre flt re pi po fr sr ad2 in sy cs us sy id
0 3 0 125520 391000 71 0 0 0 71 0 0 6272 693 10202 1 57 42
0 3 0 125520 391000 18 0 0 0 18 0 0 6282 791 9608 1 75 24
2 2 0 125524 390996 17 0 0 0 17 0 0 6040 849 9027 1 75 24

PeterZ

I thought I also post some stats with pooling enabled for comparison:

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
64070 5632 71219252 54265 0 10423132 0
64021 5725 71218808 53515 0 9782787 0
64250 5802 70964594 53940 0 10376166 0
64265 5456 71149400 53923 0 10326147 0
64199 5632 70746138 53750 0 10332519 0
63959 6994 70109209 54463 0 10769424 0
64659 6472 72257534 54529 0 10901799 0

vmstat 5

procs memory page disk faults cpu
r b w avm fre flt re pi po fr sr ad2 in sy cs us sy id
0 3 1 103652 429576 140 0 0 0 135 0 0 5283 1691 7886 1 69 30
0 3 0 103652 429576 17 0 0 0 18 0 0 2795 780 3101 1 32 68
0 3 0 103656 429572 17 0 0 1 17 0 3 2805 629 3056 0 34 66
0 3 0 103660 429568 17 0 0 0 18 0 4 2819 661 3102 1 33 66

So if pooling is enabled the packet loss becomes massive, CPU usage though drops dramatically.
Interesting enough however:

sysctl -a | grep burst

kern.polling.burst: 7
kern.polling.burst_max: 150

So it does not looks like any significant number of packets are fetched per each pool interval.

sullrich

Interrupt levels look okay. Wonder if you are running out of PCI bandwidth?

PeterZ

@sullrich:

Interrupt levels look okay. Wonder if you are running out of PCI bandwidth?

It is just 120-150Mbit which is not a lot. The specs for this network card have much larger capacity mentioned.

But anyway it looks like I see why the packet loss is happening. The packet loss are happening each 10 seconds, which seems to correspond to states purge interval for pf filter

My current theory some lock during this process blocks fetching packets from card buffer which cause packet lost.

I have about 70.000 of states active.

Now I'm looking at a way to configure it so it works better - can it be configured so not all hash table is purged at once but purge done by portions (which would reduce lock time)

Otherwise may be I can increase some kernel level buffers so packets do not have to stay in network card buffer while purge process is active.

sullrich

Let me ask Bill. He has the most experience with these large installations.

sullrich

Please try this and see if it helps.

Edit /tmp/rules.debug and add this to the top:

set timeout interval 1
set timeout { tcp.finwait 10, tcp.closed 5 }

Now run this from a shell: pfctl -f /tmp/rules.debug

Please test with these settings and let me know if it is better. We might have to make this a hidden variable. Or something.

PeterZ

This does not help but it does change things:

netstat -I em1 -w 1

input (em1) output
packets errs bytes packets errs bytes colls
9995 249 12152731 6990 0 1242327 0
10572 227 12847133 7346 0 1259203 0
10748 359 13196148 7666 0 1305645 0
11388 310 14000169 8052 0 1397665 0
11842 259 14595778 8389 0 1464044 0

So now we have same packet loss each second instead of once per 10 seconds.

It looks to be like purging happening for whole table rather than in pieces.

Changing it to do purging once per 60 seconds gives this:

124838 0 156441927 84043 0 12043484 0
117116 897 146381898 80182 0 11669285 0
116284 0 144356067 80062 0 11898673 0

(10 second increments)

Once per 60 seconds so more packets are lost in spikes but fewer in average.

Interesting - is not there some kernel buffer one can increase to avoid this problem.
It looks very strange to me purging states table blocks network device buffer from processing.

@sullrich:

Please try this and see if it helps.

Edit /tmp/rules.debug and add this to the top:

set timeout interval 1
set timeout { tcp.finwait 10, tcp.closed 5 }

Now run this from a shell: pfctl -f /tmp/rules.debug

Please test with these settings and let me know if it is better. We might have to make this a hidden variable. Or something.

sullrich

You did test this with polling off, right?

PeterZ

@sullrich:

You did test this with polling off, right?

Sure. The pooling is off and I'm not even trying to turn it on because it becomes so much worse…

sullrich

Okay, I will run this by Bill and I have emailed Max Laier who might have some tuning advice for us.

Just for the record what kind of bandwidth are you pushing? Can you share RRD bandwidth and packet graphs?

EDITED: Spelling mistakes.

billm

Can you send us the output of:

sysctl net.inet.ip.intr_queue_drops

And maybe increase net.inet.ip.intr_queue_maxlen

sysctl net.inet.ip.intr_queue_maxlen=250

And let me know if that helps.

Also…send the output of:

sysctl net.isr

Thanks

–Bill

sullrich

Yes, please provide the outputs that Bill is requesting.

Once you have outputted that and tried upping the sysctl if we still have not made any progress I have received a patch from Max that might help. If we get to this point I will compile a custom test kernel for you.

PeterZ

@sullrich:

Yes, please provide the outputs that Bill is requesting.

Once you have outputted that and tried upping the sysctl if we still have not made any progress I have received a patch from Max that might help. If we get to this point I will compile a custom test kernel for you.

sysctl net.isr

net.isr.direct: 0
net.isr.count: 444903
net.isr.directed: 0
net.isr.deferred: 444903
net.isr.queued: 321
net.isr.drop: 0
net.isr.swi_count: 411596

sysctl net.inet.ip.intr_queue_drops

net.inet.ip.intr_queue_drops: 0

After queue was increased to 250:

netstat -I em1 -w 100

input (em1) output
packets errs bytes packets errs bytes colls
638808 312 765089702 450895 0 70931893 0
668226 951 808850726 458403 0 63521833 0

So we're still loosing packets, the number is less but the traffic also dropped 30 from day high so the problem did not went away.

sullrich

From a shell, run this (testing a new kernel):

Backup old kernel

cp /boot/kernel/kernel.gz ~root/
fetch -o /boot/kernel/kernel.gz http://www.pfsense.com/~sullrich/kernel.gz
shutdown -r now

This will reboot your box, be prepared.

PeterZ

Thanks Scott,

I tried this kernel and unfortunately with it system crashes within 2 to 15 minutes from the boot.
There is still packet loss

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
125527 73 145482529 89756 0 8569116 0
125283 0 144453737 90196 0 8797933 0
114783 0 128255459 82857 0 8698382 0
112750 109 125654284 81726 0 8263655 0
112577 0 125250797 81648 0 7740500 0
113482 0 124736359 81646 0 7689588 0
110745 0 121549353 80587 0 8305343 0
111846 39 121826640 81538 0 8449537 0

This is with default 10 sec purge interval.

By the way this box is running pfsense 1.0.1 - would you recommend upgrading to 1.2.0RC1 - is it something safe enough to do for remote box ?

Also I think it would be very handy to add a feature to enter advanced pf settings somewhere so they are kept in config.
Changing purge interval and other advanced settings is very inconvenient now as rules.debug are always recreated from scratch.

@sullrich:

From a shell, run this (testing a new kernel):

Backup old kernel

cp /boot/kernel/kernel.gz ~root/
fetch -o /boot/kernel/kernel.gz http://www.pfsense.com/~sullrich/kernel.gz
shutdown -r now

This will reboot your box, be prepared.

PeterZ

I thought I would also post data from the old kernel (6.1-RELEASE-p10) for comparison, right after the box reboot:

netstat -I em1 -w 10

input (em1) output
packets errs bytes packets errs bytes colls
119425 227 136364409 85812 0 10439358 0
121176 316 137712815 87569 0 10499359 0
122182 348 140802834 87724 0 10258430 0
124622 468 144679509 88983 0 10354789 0
131813 427 152227016 94048 0 10657670 0
131419 449 151380609 93904 0 11080772 0
129773 457 149226974 91306 0 10959052 0

So the new kernel seems to be loosing less packets but still looses some and crashes.