Heavy CPU load?
-
Here's something you could try to see if you can reduce the CPU utilisation in times of high traffic load.
Enable polling (In web GUI, System -> Advanced in section Miscellaneous, check the box labelled Use device polling). With polling enabled, the NICs have interrupts disabled and once a clock interrupt (current every millisecond) the device driver is called to for each interface to process received frames and frames whose transmission has completed. This tends to increase latency somewhat but has been observed to considerably reduce CPU time in some circumstances. I believe polling can be enabled and disabled without rebooting.
-
oh my god.. not the answer i wanted to hear..
i looked on this site:
http://doc.pfsense.org/index.php/Hardware_requirements
the page says:
"Intel Pro/100 and Pro/1000 cards tend to be the best performing and most reliable on pfsense. Cheap cards like those containing Realtek chipsets (FreeBSD rl driver) are very poor performers in comparison. "Thats why i buyd Intel Pro/1000….
:(
Same here… what gives? My boxes aren't in production yet so I still have time to fix this issue, but not too much longer.
Any word on whether polling has helped?
Also, see http://forum.pfsense.org/index.php/topic,16236.0.html
-
Here's something you could try to see if you can reduce the CPU utilisation in times of high traffic load.
Enable polling (In web GUI, System -> Advanced in section Miscellaneous, check the box labelled Use device polling). With polling enabled, the NICs have interrupts disabled and once a clock interrupt (current every millisecond) the device driver is called to for each interface to process received frames and frames whose transmission has completed. This tends to increase latency somewhat but has been observed to considerably reduce CPU time in some circumstances. I believe polling can be enabled and disabled without rebooting.
Device polling could be used to reduce CPU, yes, but my performance numbers with Linux is actually with no interrupt moderation at all, and certainly not using device polling.
So even if device polling reduces CPU usage, it doesn't fix the underlying problem.With the dynamic interrupt moderation algorithm enabled in Linux the performance numbers are even higher, with just a tad more latency (still not pfSense levels though), but reduced CPU usage compared to running with InterruptThrottleRate=0,0 to get the lowest possible packet latency as I do. (Interrupt Moderation and Device Polling is not the same thing, by the way)
But we shouldn't make this a huge problem as the performance is still way good enough for almost everyone, but the current implementation is clearly lacking in maximum performance compared to the Linux implementation for the ones with firewalls that really get hit by huge amounts of traffic (maybe not so huge with complex scenarios using load balancing, shaping, large ruleset, vpn and so on)
-
In this post, it seems that FreeBSD 6.4 was OK, but there was a regression in 7.1 with the em driver:
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2009-04/msg00004.htmlOn the linux side, e1000e supports only the pci-e intel nics, while the e1000 supports the whole family. Interrupt moderation is only in the newer e1000e driver.
That said, it's all open source, and intel does supply drivers for FreeBSD:
http://downloadcenter.intel.com/T8Clearance.aspx?url=/17509/eng/em-6.9.8.tar.gz&agr=Y&ProductID=880&DwnldID=17509&lang=eng
but I haven't compared the intel drivers to what's in the FreeBSD tree. Maybe ping freebsd@intel.com to see if interrupt moderation can be added?regards, …....... Charlie
-
Hello,
The "device polling" option did not help me.
I get 80Mbit/s MAX…80Mbit/s = 100% CPU-load...
What NICs works best with pfsense then?
tell me and i will throw those shitty 3x "Intel PRO/1000 MT Dual Port Server Adapter" out the window!
My old D-link router for €20 gave me more throughput...
-
I was thinking. Your running on a pci bus not a pci-x bus correct?
1. I don't seem to remember seeing any post with 5 LB and p2p being used before. So before you throw them out it would be interesting to see the load when downloading a dvd distro with a download manager like getright.
2. Under System -> Advanced
Change Firewall Optimization Options to aggressive and set Firewall Maximum States to 500000
3. Will the load change if it was 5 different clients using 1 wan each.Basically I tent to believe the bottleneck could be related to the pci bus, pps or slbd.
-
According to graphs lion's share of load comes from 'system', why don't you give us
top ```that you get during the test. slbd is known for its bad manners to create cpu-load. try``` killall-9 slbd ``` and repeat (or during) the test. I do not believe that it's Intel's driver problem.
-
Hi guys!
Thx for trying to help me out..Yes i only have PCI… but why all the CPU usage then?
I now have aggressive mode + 500000 states.
no change there..then i did Eugenes tip.
No change…. :(When the top were taken the total speed was ~88Mbit/s (torrents)
I need to find a fast server to do the getright/FTP test..
-
You could try with systat -vm 1 and others http://www.acmesecurity.org/~thiago/public/freebsd/FreeBSD_Bottleneck_Detection.pdf
I need to find a fast server to do the getright/FTP test..
For Ubuntu there lot's of location. AFAIR In Getright split the file into 5 segment and use file mirror to search for different locations.
-
Torrents are generally a good way of doing bandwidth testing. Grab yourself the DVD of your chosen Linux distro (or a randomly selected one).
-
Your top output shows an unexpected (by me) large number of dhclient processes using an unexpectedly large amount of CPU time.
On my system:
uptime
7:22AM up 43 days, 10:39, 2 users, load averages: 0.31, 0.31, 0.26
ps ax | grep dhclient
335 ?? Is 0:00.51 dhclient: rl0 (dhclient)
284 con- I 0:00.10 dhclient: rl0 [priv] (dhclient)On my system there are two dhclient processes which in 43 days haven't even used a second of CPU time between them while in yours (uptime of over 55 days) you have at least 7 dhclient processes which have each used at least 130 MINUTES of cpu time and all 7 are in the RUN state. On my system only the WAN interface (rl0) acquires an IP address by DHCP.
How many interfaces should be trying to acquire an IP address by DHCP?
Why are so many dhcp clients all in the run state? (Are your leases expiring every milli-second? :) )
Are there any log files which would give a hint as to why the DHCP clients are so busy?
-
its true it seems like _dhcp (dhclient) that makes the CPU load?
I have 5 NICS on DHCP… and DHCP-Server on the 6 interface...
i get alots entrys like this:
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em4
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em3
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em5
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em2but i know why...:
They are all on the same VLAN…
:/can this be the problem?
-
May I ask you about the reason you have five WAN interfaces? with one ISP… :-\
-
I have 5 NICS on DHCP… and DHCP-Server on the 6 interface...
Its not clear to me what this means. I guess you are saying you have most (or all) of your interfaces serving DHCP addresses AND requesting DHCP addresses from another DHCP server. This is not a good idea. Your DHCP server interfaces should have static (fixed) IP addresses.
i get alots entrys like this:
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em4
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em3
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em5
May 28 19:51:02 kernel: arp: 85.226.120.1 is on em1 but got reply from 00:03:a0:3b:80:00 on em2Your network topology and/or address assignments are messed up. 85.226.120.1 is accessible on multiple interfaces, it should be accessible over only one interface (unless you have bridged interfaces, but then why would you have a switch?) And printing these messages repeatedly will be another consumer of CPU time.
What are you trying to accomplish with this configuration? At first sight it appears overly complex.
-
can this be the problem?
Yep. This is most likely your problem as the DHCP processes shouldn't be using any CPU at all.
It still doesn't solve the em-problems, but that's probably not what's limiting you with that massive CPU-usage from DHCP.
Probably your problem is solved by ensuring that the DHCP-server is not running on the WAN-interfaces as it seems that you are actually running DHCP-server on those in addition to the LAN-interface.
This should be a configurable setting. -
My ISP gives me 10Mbit/s for every IP we use.
Max 5 IP-addressesThats why i use five NIC's to get my five IPs.
So with one IP 10Mbit/s with two 20Mbit/s.. and five 50Mbit/s
Okay?
My ISP will never give me static IPs
Always DHCP…here is how it works:
em0/LAN Static 192.168.1.111. And runs DHCP Server for LAN clients.
em1/WAN Dynamic DHCP Client
em2/WAN1 Dynamic DHCP Client
em3/WAN2 Dynamic DHCP Client
em4/WAN3 Dynamic DHCP Client
em5/WAN4 Dynamic DHCP ClientIf i do killall dhclient
My CPU usage get low. But pfsens stop working after a while.....
So what is wrong ?:(
Okay... kill dhclinet works... but the firewall dies so i have to restart it after a while....
-
All your ports share the same switch?
Ive never had good luck when I had two dhcp servers (your pfsense lan and your isp's modem) on the same switch…
Can you move your lan to another switch to rule that possible issue out?
-
I use VLANs.
So its physically one switch but inside they are different.You can read about it here: http://en.wikipedia.org/wiki/VLAN
-
I know what you are trying to do so I guess Ill ask outright…
Have you ruled out a misconfiguration on your switch as the root cause of your problem?
What else have you tried in your troubleshooting process?
Start with the basics and add one element at a time until you can reproduce the result.
Your setup while innovative is not typical.
Good Luck!
-
There is nothing wrong with the Switch. As you can se here the Vlan settings is so simple.
You guys just helpt me to see that it is wrong with the dhclient.
what is wrong with my setup thats makes it non typical?
What else can i do to troubleshoot? i have killd dhclient and everyhing works fine..