Issues with device polling (alix2d13, 2.0-RELEASE, 100/7 WAN)



  • Hi

    I'm new to pfSense (had a sonicWALL TZ 210 before)…

    If I disable "device polling" I can't max out my WAN connection (get only about half speed = 46-52 Mbit) due to interrupts, but WebGUI access is fine and CPU usage is fine too.

    If I enable "device polling" I'm able to max out my WAN connection (75-106 Mbit, depending on my ISP), but CPU usage is all the time at about 90-100% – even if system is idle – and webGUI access is very sluggish or even unresponsible.  :o

    top -SH

    last pid:  5553;  load averages:  2.64,  2.79,  2.62              up 0+00:42:41  10:15:51
    119 processes: 7 running, 101 sleeping, 11 waiting
    CPU:  0.0% user,  0.8% nice, 91.9% system,  7.3% interrupt,  0.0% idle
    Mem: 81M Active, 29M Inact, 40M Wired, 34M Buf, 84M Free
    Swap:

    PID USERNAME PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
      17 root      76 ki-6    0K    8K RUN    35:00 90.97% idlepoll
      11 root    -44    -    0K  104K RUN      2:06  3.96% {swi1: netisr 0}
    33340 root      65  20  3352K  1068K nanslp  0:57  0.98% blinkled
    30716 root      65  20  3352K  1068K RUN      0:57  0.98% blinkled
      13 root    -16    -    0K    8K -        0:43  0.00% yarrow
        0 root    -16    0    0K    56K sched    0:38  0.00% {swapper}
      11 root    -64    -    0K  104K WAIT    0:09  0.00% {irq14: ata0}
      11 root    -32    -    0K  104K RUN      0:08  0.00% {swi4: clock}
    38856 root      48    0 33116K 18968K accept  0:05  0.00% php
      10 root    171 ki31    0K    8K RUN      0:03  0.00% idle

    […]

    sysctl kern.polling

    kern.polling.idlepoll_sleeping: 0
    kern.polling.stalled: 0
    kern.polling.suspect: 117
    kern.polling.phase: 0
    kern.polling.handlers: 2
    kern.polling.residual_burst: 0
    kern.polling.pending_polls: 0
    kern.polling.lost_polls: 6353
    kern.polling.short_ticks: 0
    kern.polling.reg_frac: 20
    kern.polling.user_frac: 50
    kern.polling.idle_poll: 1
    kern.polling.each_burst: 5
    kern.polling.burst_max: 150
    kern.polling.burst: 150

    I would realy appreciate if someone could take a look to my problem and help me out with some advice.



  • @dr3do:

    If I disable "device polling" I can't max out my WAN connection (get only about half speed = 46-52 Mbit) due to interrupts, but WebGUI access is fine and CPU usage is fine too.

    Does your speed test use a single TCP connection or multiple connections? Is single TCP connection
    speed what you care about or aggregate speed across multiple connections?

    @dr3do:

    If I enable "device polling" I'm able to max out my WAN connection (75-106 Mbit, depending on my ISP), but CPU usage is all the time at about 90-100% – even if system is idle – and webGUI access is very sluggish or even unresponsible.  :o

    Polling means the network device drivers run with interrupts disabled and get called by the OS periodically in the place of interrupts. This can reduce overhead by effectively coalescing interrupts but often means the system is polling network drivers instead of running user tasks. The kern.polling.* sysctls can be tweaked to make the system a bit more responsive but at the cost of increased network latency.



  • Hi wallabybob, thanks for your reply.  :D

    @wallabybob:

    Does your speed test use a single TCP connection or multiple connections? Is single TCP connection speed what you care about or aggregate speed across multiple connections?

    Have tested single connections and multiple connections. In both cases (with disabled device polling) speed is limited to about 50Mbit.

    Well… my ISP gave me the choice between 50Mbit and 100Mbit… I decided to take 100Mbit because I'm doing a lot of onlinestuff. I have a little Server (Mac Mini Server) and a bunch of Macs at home. Thought I could take advantage of faster connection especially with configured bandwithmanagement (traffic shaping). Most of time (of 24h) I have a single connection (server to inet), but when we're at home, then we have trending up multiple connections.

    I think the answer should be: I care about aggregate speed across multiple connections

    @wallabybob:

    Polling means the network device drivers run with interrupts disabled and get called by the OS periodically in the place of interrupts. This can reduce overhead by effectively coalescing interrupts but often means the system is polling network drivers instead of running user tasks. The kern.polling.* sysctls can be tweaked to make the system a bit more responsive but at the cost of increased network latency.

    Thanks for your explanation. So I tend to conclude, that this "Alixbox "is underpowerd for NAT/firewalling, 1-2 VPN connections (for my own when I'm not at home), traffic shaping / bandwithmanagement (but not installed right now), DHCP, DNS and VoIP. Would you you agree?  ???

    Before I did post here, I have tried to find something useful about tweaking/tuning device polling:
    http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial
    http://www.mail-archive.com/support@pfsense.com/msg02586.html

    Have tried to play around with settings posted in 2nd link, but till now without result.



  • Was curious and have installed for testing purpuse m0n0wall on same hardware…

    Without activated device polling it get's only about 25Mbit, but CPU usage is fine.
    With activated device polling it gets about 85Mbit and CPU usage is fine too.

    As I wrote I'm very new on pfSense, so this looks like bug/problem… it gives me hope that some can help to get CPU usage down.



  • I suggest you get the values of the kern.polling sysctl variables when running m0n0wall and compare them with the values when pfSense is running. If you use m0n0wall's values in pfSense you might get polling behaviour more to your liking.



  • @dr3do:

    Thanks for your explanation. So I tend to conclude, that this "Alixbox "is underpowerd for NAT/firewalling, 1-2 VPN connections (for my own when I'm not at home), traffic shaping / bandwithmanagement (but not installed right now), DHCP, DNS and VoIP. Would you you agree?

    I think getting 50Mbit is just about right on an ALIX with the above functions, you generally want to run with Device Polling off.  Considering you have a 100Mbit connection (so jealous), I would say you are very underpowered for your network.

    If power consumption/size of a pfSense box is of a concern, then your next best option would be a mitx Atom setup but thats going to set you back a few hundred.  If power consumption/size isnt of great concern then see if you can pick up a Optiplex 745 SFF w/a Core2Duo processor from eBay for about a hundred bucks, slap a Intel dual LAN PCI or PCIe card in there and say goodbye to your bandwidth/CPU concerns.



  • @wallabybob:

    I suggest you get the values of the kern.polling sysctl variables when running m0n0wall and compare them with the values when pfSense is running.

    Great idea, will do it!  8)



  • @onhel:

    […] I would say you are very underpowered for your network.

    Thxs for you rating.

    OT: And hey, no need to be jealous… the uplink has only 7Mbit.  ::)

    @onhel:

    If power consumption/size of a pfSense box is of a concern, then your next best option would be a mitx Atom setup but thats going to set you back a few hundred.

    If I don't get a better result with comparing the sysctl settings (m0n0wall vs. pfSense) I will take a look to available hardware in Switzerland…


  • Rebel Alliance Developer Netgate

    FYI- on ALIX your performance with and without polling is going to be comparable. The bottleneck is the CPU in both cases, there isn't any wiggle room on that device.

    Polling uses CPU to poll the NICs for data. It will always show 100% usage since it polls in the idle loop.

    It will give up CPU to other tasks if they need it, that's just how polling works.

    That said, polling won't buy you anything on ALIX really. You may as well turn it off.

    I've passed about 87Mbit/s through an ALIX last time I tested it. That's in the clear though. Any amount of VPN traffic that the device has to handle will bring that way down. As will anything else that uses up CPU.



  • @wallabybob:

    I suggest you get the values of the kern.polling sysctl variables when running m0n0wall and compare them with the values when pfSense is running.

    I have compared the both kern.polling settings between m0n0wall and pfSense and the the difference ("static" entries) is:

    kern.polling.idlepoll_sleeping: 1
    kern.polling.phase: 2
    kern.polling.handlers: 2
    kern.polling.burst: 150

    But is seems that it makes unfortunatly no difference for pfSense. Speed (with activated device polling) is fine, but CPU usage, while beeing idle, is all the time at maximum. But m0n0wall hasn't this behavior during my tests, means that CPU usage went down while beeing idle. m0n0wall would solve this issue, but it's not that firewall I'm searching for. Even though I'm new to pfSense… I really like it, it's great!



  • @jimp What a honor to me, that you did reply to me.  8)

    @jimp:

    Polling uses CPU to poll the NICs for data. It will always show 100% usage since it polls in the idle loop.

    At the moment I try to understand why on m0n0wall the CPU usage goes really down when being idle, but not on pfSense…

    The only reason – with my little knowledge  –  I could think about, is, that sysctl kern.polling.idlepoll_sleeping=1 does NOT work, means that on pfSense it stays always at kern.polling.idlepoll_sleeping: 0.

    If I try to enable idlepoll_sleeping via ssh shell, I get this message in return:

    sysctl kern.polling.idlepoll_sleeping=1
    sysctl: oid 'kern.polling.idlepoll_sleeping' is read only

    As I have read here this settings seems to be responsible for the high CPU usage during idle:

    kern.polling.idle_poll
        Controls if polling is enabled in the idle loop.  There are no
        reasons (other than power saving or bugs in the scheduler's han-
        dling of idle priority kernel threads) to disable this.

    What du you think about?

    @jimp:

    That said, polling won't buy you anything on ALIX really. You may as well turn it off.

    On my alix2d13 there's a difference: ~30Mbit/s (turned off: 46-50Mbit/s; turned on: 74-82Mbit/s)

    PS: Sorry for my english, I'm not a native speaker.  :-[



  • try to add that line on /boot/loader.conf.local file and reboot

    EDIT: and I tried to mean: kern.polling.idlepoll_sleeping=1



  • @Metu69salemi:

    try to add that line on /boot/loader.conf.local file and reboot

    OK, will try this. Really loader.conf.local an not (already existing) loader.conf?

    EDIT: Have found this, so… think my question is obsolete.



  • @Metu69salemi:

    try to add that line on /boot/loader.conf.local file and reboot

    I have tried to add it via System Tunables, but after reboot it's still off.

    Then I have figured out that I have to add it via exce.php (echo "kern.polling.idlepoll_sleeping=1" >> /boot/loader.conf.local). After that I did reboot the pfSense, but it's still off too.

    After reboot:
    $ sysctl kern.polling
    kern.polling.idlepoll_sleeping: 0



  • i'm out of ideas. maybe someone else knows better



  • A quick scan of the FreeBSD source code suggests:

    • kern.polling.idlepoll_sleeping is a status variable reporting whether the network device polling loop is sleeping (1) or polling (0). Since it is reporting a kernel status it is read only to a user.

    • kern.polling.idle_poll is a kernel variable specifying whether the network polling loop should take a brief nap (0) after polling all devices or immediately go back and poll all devices to see if they have work to do (1).

    The device polling loop is supposed to run as the lowest priority task in the system apart from the idle loop.

    @dr3do:

    If I enable "device polling" I'm able to max out my WAN connection (75-106 Mbit, depending on my ISP), but CPU usage is all the time at about 90-100% – even if system is idle – and webGUI access is very sluggish or even unresponsible.  :o

    You are probably running at about maximum throughput. Your original post shows kern.polling.idle_poll is 1 so your system will always be busy because it will be polling network devices when there is nothing else to do. Try setting kern.polling.idle_poll to 0 and see what happens to both throughput and GUI responsiveness. It might also be worth trying setting kern.polling.burst_max down from 150 to say 75 or even 40 to see what happens.



  • @jimp:

    FYI- on ALIX your performance with and without polling is going to be comparable. The bottleneck is the CPU in both cases, there isn't any wiggle room on that device.

    Polling uses CPU to poll the NICs for data. It will always show 100% usage since it polls in the idle loop.

    It will give up CPU to other tasks if they need it, that's just how polling works.

    That said, polling won't buy you anything on ALIX really. You may as well turn it off.

    I've passed about 87Mbit/s through an ALIX last time I tested it. That's in the clear though. Any amount of VPN traffic that the device has to handle will bring that way down. As will anything else that uses up CPU.

    Damn thats what im seeing too(87) on my 100 meg link with Virgin in the UK.  I really like my alix but will get upset about not using all the bandwidth on offer.

    Guess its a atom board for me.  Can anyone recomend a nice firewall enclosure for me?  Slim as possible.


  • Rebel Alliance Developer Netgate

    Yeah 100Mbit/s is just beyond what the ALIX is capable of.

    As for Atoms, there are pre-built ones out there like the FW-7535 from Netgate, and the new Soekris net6501. Otherwise, I'd go for a nice Supermicro 1U atom setup.

    I have a net6501 here that just arrived a couple days ago. It's humming along nicely so far but I have yet to make any real tests happen on it (E_NOTIME).



  • Looking at the other posts i might go for a sandy bridge setup.  At 500mhz for 87 1000meg network equils roughly 5750 mhz.  Assuming the network card drivers can at least have a thread per processor per card thats a 2.87 ghz duel core or above.

    I know we wont have gig too soon but my mate in sweeden is already on a 1 gig connection just as i got my 100 lol.   (he only see's 300 currently no dobt because of his network gear)

    Might as well build it to last.

    I was thinking about a a Pentium G850 (duel core) which is rated at 2.90GHz.  Is this thinking sound?

    How does pfsense or rather BSD make use of the cores with network cards?  Would a quad give more headroom for the firewall if it was only running the two interfaces (with an intel or other nice onboard network).


Locked