igc0: link state changed to DOWN
-
I have two pfSense+ firewalls running 23.05 in CARP, hardware is different but NICs are similar, both machines are using igc0: <Intel(R) Ethernet Controller I225-V> for WAN, and both are started to put the link up and down randomly, WAN is PPPoE and this happens only when PPPoE is active, PPPoe is active only when firewall is MASTER. Both have suricata enabled inline mode. Does anyone have seen some similar behavior?
-
What do you see logged? Is it truly random or when some other event happens?
What revision are those NICs? The rev1/2 i225-V NICs have a known fault that presents as random loss of link. Though I have one device with them and I've never seen it.
Do you have the parent interface assigned as well as PPPoE on it? Do you have a CARP VIP on there?
Technically PPP connections are not supported across HA but I know there are people doing that.
Steve
-
@stephenw10 said in igc0: link state changed to DOWN:
What do you see logged? Is it truly random or when some other event happens?
I see igc0: link state changed to DOWN several times, this looks randomly, at least I did not notice any connection with other events. Currently, went back to 23.01 on one of the firewalls for testing purposes. This usually happens numerous times in 24 hours. Will see.
What revision are those NICs? The rev1/2 i225-V NICs have a known fault that presents as random loss of link. Though I have one device with them and I've never seen it.
It looks like it's version 3, at least that's what it said in the product title, where to look in freebsd?
igc0@pci0:5:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f3 subvendor=0x8086 subdevice=0x0000 vendor = 'Intel Corporation' device = 'Ethernet Controller I225-V' class = network subclass = ethernet
Do you have the parent interface assigned as well as PPPoE on it? Do you have a CARP VIP on there?
Yes I have assigned interface, no it does not have CARP VIP on there, only on LAN side.
Technically PPP connections are not supported across HA but I know there are people doing that.
I am using script, that periodically checks if the firewall is MASTER and then puts PPPoE up or down if it's BACKUP...
but in those moments when the link disappears, this script is not executed, it also writes its actions to the log and at that moment there are no records.Steve
There is another problem, why I actually noticed that something is wrong, even if the link to the network goes up, the unbound remains in the status
May 9 05:40:52 unbound 27963 [27963:0] info: service stopped (unbound 1.17.1).
GUI is also not responding, if I restart gui and php in console I get message about XMLRPC lock or something like that.
I also forgot to mention that there is a second WAN network interface with the same igc hardware. The link does not disappear, the same for the LAN interface, there is also an igc and there are no problems either. So there are 4 igc interfaces on one firewall and only one changes it link sate.
There is failover gateway configure, this works fine usually when it works
If I manually start the unbound everything is going to work again.
As for the suricata, it is activated only on the LAN interface, which, in theory, should not concern the PPPoE, I'm not sure about this, because when it was buggy enough it crashed everything on the fw, early versions, 23 alpha and 22.05 also.I'll do more testing on the 23.01 and maybe something else will clear up.
-
The known issues with i225-V are EEE related as I understand it. EEE should be disabled by default but if for some reason it's enabled on your NICs it could be that the PPPoE link is the only one that supports it. Or maybe supports it incorrectly.
Check
sysctl dev.igc.0.eee_control
andsysctl hw.igc.eee_setting
Both should be 1 if eee is disabled, despite the description text for hw.igc.eee_setting.
Steve
-
@stephenw10 said in igc0: link state changed to DOWN:
The known issues with i225-V are EEE related as I understand it. EEE should be disabled by default but if for some reason it's enabled on your NICs it could be that the PPPoE link is the only one that supports it. Or maybe supports it incorrectly.
Check
sysctl dev.igc.0.eee_control
andsysctl hw.igc.eee_setting
Both should be 1 if eee is disabled, despite the description text for hw.igc.eee_setting.
Steve
Yes, both are 1.
-
What are those ports connected to? You might try putting a switch in between as a test.
-
@stephenw10
Well, there is switch already and I'm beginning to suspect that this may be the case, because reverting back to 23.01 didn't work. I'll try to replace it with spare one and see. -
I tried another switch, unfortunately it did not help. I decided to check the configuration files for the IGC driver settings and found some that were once recommended here on the forum or somewhere else.
I deleted all those settings at once and so far the link has been working stably for more than 24 hours, which is positive. -
Hmm, what settings did you have in there?
-
@stephenw10
Dug a little deeper.dev.igc.0.enable_aim="1"
Not sure this one is working at all, because it's unknown oid, yeah it's stupid that I did not control is it working…
dev.igc.0.fc="0"
This could be the reason.
dev.igc.0.eee_control="0"
Don't remember why I have put it in there, but I did control when you asked that it was "1" in sysctl, I have saved partial output of sysctl dev.igc.0. 3 days agodev.igc.0.watchdog_timeouts: 0 dev.igc.0.rx_overruns: 0 dev.igc.0.link_irq: 2 dev.igc.0.dropped: 0 dev.igc.0.eee_control: 1 dev.igc.0.itr: 488 dev.igc.0.tx_abs_int_delay: 66 dev.igc.0.rx_abs_int_delay: 66 dev.igc.0.tx_int_delay: 66 dev.igc.0.rx_int_delay: 0
I am not sure what was the real state for dev.igc.0.eee_control when it has been set in loader, but sysctl shows different value, I still hope that it was 1
-
I'm not sure that does anything as a loader variable anyway. Generally values like
hw.igc.x
are loader variables and are read-only as a sysctl. And values likedev.igc.x
are sysctls and should be set in the system tunables. But that's not trues for all drivers!