Dell R710 Port Flapping



  • Hello,

    I have a DELL R710 server (with 4 Broadcom NICs) running PFSense 2.4.2-RELEASE-p1 and experience port flapping on the NIC port assigned to VLAN1.
    The setup is the following:

    • bce0 –> WAN over PPPoe
    • bce1 --> unused
    • bce2 --> LAN (VLAN1) --> connected to a cisco SG200-26 Switch
    • bce3 --> LAN (VLAN10) --> connected to the same cisco switch.

    Randomly, port bce2 starts flapping for a couple of seconds. Sometimes it doesn't even recognize the LAN cable that is plugged in.

    I've followed this article: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards and created the loader.conf.local file in /boot:

    kern.ipc.nmbclusters="131072"
    hw.bce.tso_enable=0
    hw.pci.enable_msix=0

    The issue is less frequent, but still occurs.

    I've tried connecting the LAN to a 100Mbps cisco switch (Old Catalyst) and the problem doesn't seem to occur.

    Can anyone give me some advice ?

    Thanks,
    Andy



  • Tried using bce1 instead? Maybe bce2 is simply broken.



  • Same issue.
    It works for hours, then suddenly the port shuts down.
    For example I watched more than 12 hours of Netflix without any issue, then suddenly, it happened.
    Is there any fix to this or even some logs I could see to try to identify the cause?

    Thanks,
    Andy



  • @pbnet:

    Same issue.
    It works for hours, then suddenly the port shuts down.
    For example I watched more than 12 hours of Netflix without any issue, then suddenly, it happened.
    Is there any fix to this or even some logs I could see to try to identify the cause?

    Thanks,
    Andy

    Not sure about the logs. This seems to be a link-layer issue, doesn't it? If it happens with one switch and one port, but not other switches or other ports…



  • I recently had some problems with a dell switch where EEE was causing the port to flap.
    Disabling EEE support on the port in question helped solve this.



  • Done that.
    So far I have: Uptime: 1d 01:33:12
    Weird, since there is permanent traffic on that NIC, so it should not go into any sort of Power Management.
    So far it works… I'll keep an eye on it...

    Thanks for the hint.



  • @pbnet:

    Done that.
    So far I have: Uptime: 1d 01:33:12
    Weird, since there is permanent traffic on that NIC, so it should not go into any sort of Power Management.
    So far it works… I'll keep an eye on it...

    Thanks for the hint.

    Sometimes it's not as much EEE state changes but simply broken negotiation that makes the ASIC on either end just bork.



  • Well the idea of EEE is to shut the link down when it's "not being used" even if the link is there.
    This makes (maybe?) sense in an environment where you have enduser devices like a desktop PC which just burns power when the link is there (PC has standby power, but isn't switched on) but not actually used.
    The phy of a gigabit network port draws around 1W of power, no matter if it's used or not (times 2 since there are 2 ends of the link).
    In an office environment where you have potentially hundreds of workstations it really makes sense to switch the link off when it's not being used.
    Howere since EEE is a feature purely by the PHY it's kind of hard for it to know if the device attached at its {S,R?[G]}MII interface is even switched on, besides counting the time since the last time the tx_en line was active.

    I work for a manufacturer of access points and in our experience there are way too many combinations of PHYs which "sometimes" just make a wrong decision and switch the link off even when it shouldn't. (Dell / Atheros is such a combination…)
    This doesn't really hurt for a desktop PC where the most you notice is a short hickup.

    It makes sense for fix installed backbone/infrastructure links which should never go down, to just disable EEE on the respective ports.



  • Thanks  a lot for every hint provided.
    So far all looks good

    WAN Uptime: 4d 12:23:21


Log in to reply