Igb driver - interface flapping for no apparent reason!?
-
have you tried the options mentioned here? https://forum.pfsense.org/index.php?topic=69486.0
or just try these options in your /boot/loader.conf.local
kern.cam.boot_delay=10000 kern.ipc.nmbclusters=1000000 hw.igb.num_queues=1 legal.intel_ipw.license_ack=1 legal.intel_iwi.license_ack=1 hw.pci.enable_msix=0 hw.igb.enable_msix=0
-
have you tried the options mentioned here? https://forum.pfsense.org/index.php?topic=69486.0
or just try these options in your /boot/loader.conf.local
kern.cam.boot_delay=10000 kern.ipc.nmbclusters=1000000 hw.igb.num_queues=1 legal.intel_ipw.license_ack=1 legal.intel_iwi.license_ack=1 hw.pci.enable_msix=0 hw.igb.enable_msix=0
Very interesting, but the post you linked was from 2013 and I would hope that the drivers/kernel have advanced beyond this! Hoping maybe someone from the development team can comment on whether this is still necessary?
-
There are a lot of improvements since 2013, but also igb4 still have a lot of bugs, AFAIK kern.ipc.nmbclusters=1000000 at least still needed to maintain stabilty, also I would recommend to disable TSO and LRO (System/Advanced/Networking), and I am using hw.igb.num_queues=1 also since 2.3, I know something was improved since, but I still seeing problems reported again and again.
But… Honestly I have never encountered a cyclic state change of the igb network adapter, so it can be hardware issue as well... or it depends on hardware — in my case it just gives me kernel panic and in yours it cycling adapter, I am not sure. -
Does it matter if I put these into loader.conf.local or just use the GUI's system tunables instead?
I prefer to keep stuff in the GUI / accessible so it's backed up with a config dump rather than having to remember that I put something "important" into loader.conf.local…
-
I hope it does not matter 8)
-
I've seen flapping with a lot of the ONUs that BrightHouse used and resolved it by manually setting the interface speed. Only other time I've seen it is a bad switch. I've never seen it where my pfSense router was the issue but anything is possible. What if you make a different port your LAN port to see if it is a bad port?
-
I've seen flapping with a lot of the ONUs that BrightHouse used and resolved it by manually setting the interface speed. Only other time I've seen it is a bad switch. I've never seen it where my pfSense router was the issue but anything is possible. What if you make a different port your LAN port to see if it is a bad port?
I had problems with the WAN side against an older Surfboard modem a looong time ago, but that was solved by hard-setting the mediatype to "autoselect" (instead of "Default (autoselect/driver's preference)").
That didn't seem to help here.
I haven't tried changing the port yet because it doesn't seem like a port hardware problem… I would expect consistent traffic problems rather than the weird intermittent sporadic issues I've had (that mostly seem to come up after rebooting the system for some reason).
hw.igb.num_queues is apparently only modifiable at boot-time, so it must go into /boot/loader.conf.local and can't be set via system_advanced_sysctl.php. :( It also doesn't seem to help the problem.
I'm wondering if it's somehow service-related… The only packages I'm running are darkstat, NUT, and Avahi. When I reboot the machine (yay for IPMI), I noticed the interface flapping behavior stop immediately after the console printed a notice about the Avahi shutdown, but that may have just been coincidence. I just disabled Darkstat for now (which I know puts the interface into promiscuous mode, so I'm wondering if that could have anything to do with it too).
The board I'm using needs to be RMA'd anyway due to the Atom clock/boot bug... I need to get in touch with Supermicro and hope they can do an "Advance RMA" where they send me a new board first.
-
hw.igb.num_queues is apparently only modifiable at boot-time, so it must go into /boot/loader.conf.local and can't be set via system_advanced_sysctl.php. :( It also doesn't seem to help the problem.
If the system has created 4 queues for that port, it needs time to fill them up with packets. And if this will take to much time
I am pretty sure that more then two installed packets will be reporting also any issues too on that pfSense system. A real
port miss match cold be fast solved by setting up the proper port speed number. Narrow down the queues will be one thing
and set up then the right mbuf.size number another thing.I had problems with the WAN side against an older Surfboard modem a looong time ago, but that was solved by hard-setting the mediatype to "autoselect" (instead of "Default (autoselect/driver's preference)").
On that boards, and by default, the WAN port and the IPMI port will be shared due to the settings inside of the BIOS.
You will be able to change this fast by rebooting and changing this settings in the BIOS for having perhaps success. -
I know that Asrock Rack boards do really have this problem - even if you have dedicated port for IPMI, the first port is also shared for IPMI, this can cause a lot of problems, like huge packet loss or non working IPMI, bit I am not sure is this applicable to supermicro boards also.
-
@BlueKobold:
hw.igb.num_queues is apparently only modifiable at boot-time, so it must go into /boot/loader.conf.local and can't be set via system_advanced_sysctl.php. :( It also doesn't seem to help the problem.
If the system has created 4 queues for that port, it needs time to fill them up with packets. And if this will take to much time
I am pretty sure that more then two installed packets will be reporting also any issues too on that pfSense system. A real
port miss match cold be fast solved by setting up the proper port speed number. Narrow down the queues will be one thing
and set up then the right mbuf.size number another thing.I had problems with the WAN side against an older Surfboard modem a looong time ago, but that was solved by hard-setting the mediatype to "autoselect" (instead of "Default (autoselect/driver's preference)").
On that boards, and by default, the WAN port and the IPMI port will be shared due to the settings inside of the BIOS.
You will be able to change this fast by rebooting and changing this settings in the BIOS for having perhaps success.I have already adjusted the num_queues to 1 as well as the nmbclusters to 1000000 and it did not help (system has been rebooted multiple times since adding these to loader.conf.local). This supermicro board's IPMI is a separate dedicated port on the motherboard, and there isn't even an option to "share" one of the main ports. Even if it could, I would expect that to be port 1 (igb0), which is my WAN port and hasn't shown any problems (yet).
The interfaces are both configured for "autoselect", which matches with the equipment on the other end, but I'm still having trouble.
Oddly, the WAN side is fine, it's only the LAN that is acting up. I'm at work right now and can SSH to the router (I have a VPN tunnel between work and home) and watch as igb1 keeps going from "no media" (thinks it is unplugged) to "active." I tried using igb2 as LAN earlier (which is currently unused) and it was doing the same thing, flapping up and down, so I went back to igb1.
I just altered igb1 to 1000-full (hardcoded) and now it seems to be stable again, but this isn't "correct" since the equipment on the other side is autonegotiating.
-
@zprime
I upped this very old thread, because it helped with my pfsense 22.05 plus system.Sometimes i have been problems with jammed connection, normally i do reboot and all go again fine.
Now i started use IPv6, and i noticed very often jamming problems.
I noticed removing lan cable cause unstable state and it not go back with replacing lan cable.After many hour googling, i found this thread and it helped.
When i put fixed 1000T fd state all my NICs, pfsense is solid again.I hope some speaking why this very old bug still exist.
I have intel 4 port NUC