Keglimit



  • Hi all,

    I'm trying to install pfsense 2.0B4 on a dell R610.
    The Dell has 4 integrated broadcom NICs (BCE) and two additionnal  Quad ports NICs (IGB).

    At the first reboot after installation pfSense hangs. It may be during setting up interface or anything else like gateway monitors.
    Trying a CTRL+T give me the following output:

    load: 0.00 cmd: php 9131 [keglimit] 371.94r 0.11u 0,02s 0% 23364K

    What do you think about that ?
    Unkillable process, related to a bug in igb driver ?

    Thanks



  • If I recall correctly the BCEs are pretty hungry for kernel virtual address space, much more hungry than the igbs.

    Speculation: the default kernel memory allocation doesn't cope well with the demands of 4 bces leaving insufficient memory for the kernel heap. (The 12 NICs in your box will cause demand for kernel memory for the receive rings to be considerably higher than in a "more typical" box.)

    Suggestions:

    • Use the amd64 build; or

    • remove or disable one or more of the Broadcom NICs; or

    • tweak the FreeBSD VM settings to better manage the available kernel address space. (Its a while since I've done this so I'd need to do some some research if you want help with this.)

    I think if you search the forums for bce you'll find at least a couple of topics discussing bce resource requirements and the implications. Maybe at least one of those also involved a Dell R610.



  • I think you are right,

    I'v seen this on the boot screen (it goes fast during boot):  "igb can't allocate receive structure".
    For information, I have two boxes (same hardware), the first runs fine with snapshot from november 27th. the second, which fails, with snapshots from December 20th.

    I'll try an amd64 today to see what happens.

    Thank you !



  • @Juve:

    I'v seen this on the boot screen (it goes fast during boot):  "igb can't allocate receive structure".

    I recall seeing a report of this problem which was made worse by multiple CPUs (something allocated per NIC per CPU).



  • amd64 can't boot:

    -with ACPI it stops at "Configuring device maneger"
    -without ACPI it stops at "Setting up IPSEC random…."

    any idea ?
    Tying with only one core ?



  • A bit more of the startup output might be helpful but it probably stops at a place where it is difficult to capture the output. Perhaps you could supply a screen photo.

    Might be worth trying the single CPU kernel.

    If you haven't already done so, it would probably be useful to have a 386 install on one disk (or slice) and an amd64 install on another partition (or slice) so that if this sort of thing happens again you could reboot the 386 install and have utilities to access the file system of the amd64 install.



  • Here is a capture when booting with ACPI, BETA4 amd64 (IP KVM).
    27 hours after the boot.
    I made some Ctrl+T.

    With 386 it just hang a bit later .

    I've done a full firmware update, ran diagnostic tools….everything is fine.
    Snapshots from end of november run just fine.




  • Looks strange. Can you for example disable the 4 internal nics, so that only the secondary nics are present at the system? And if that doesn't work, try with the secondary netcard(s) disabled or taken out of the system?
    Maybe it will clear the things up…



  • I'll try this on Tuesday.

    THe other box running snapshot from the end of November is hanging once a week and needs a reboot, before disabling the captive portal it was hanging once a day….
    Mbuf usage is close to the max, perhaps because of broadcom devices:

    22542/1398/23940 mbufs in use (current/cache/total)
    22522/1212/23734/25600 mbuf clusters in use (current/cache/total/max)
    22520/648 mbuf+clusters out of packet secondary zone in use (current/cache)



  • Still the same problem with snapshots from January 7th.
    x86 or x64 platform.
    php hangs in keglimit state.


Locked