Firewall looses L2/L3 connection, VLAN tagging - Intel igb driver
-
What is the rate of the total incoming SYN packets I meant rather then per connection.
I assume they are from different source IPs?
Are you logging that traffic? That can end up consuming a lot of CPU. If you can block or pass without logging the firewall will remain up against a far bigger attack.
Steve
-
Please note I'm checking this as the firewall is functional. The weird thing is that packet capture from firewall export to wireshark the amount of SYN is very little. I'm exporting netflow upstream and that tells me 91% of TCP packets destined for WAN have TCP SYN flag set:
- As a percentage of the total traffic the SYNs are very little according to wireshark capture from firewall WAN. About 0.3-0.5%. I'm not sure how I can check the rate.
- Yes, they are all from different source IPs.
- Yes, Initially I was not logging traffic on the firewall, but now I enabled it to see what's going on.
This is the stats from 100k packet capture from WAN:
-
Hi @xciter327 - reading through your post I had a few clarification questions:
- If you unplug the network cable from e.g. igb0 and plug it back in, is connectivity restored or is a reboot absolutely necessary?
- Does your Gateway latency spike right before the firewall stops working?
- Does everything stop working? Or, do existing connections still work and you just aren't able to open new connections?
- What type of internet connection do you have? That is, what's on the other side of the firewall, cable, DSL, fiber, etc.?
Thanks in advance.
-
Hi @tman222 I will try to answer as best as I can:
-
- When I unplug the cable it does correctly register a hot-plug event on the interface, but connectivity is not restored. Reboot is absolutely required. I have not found another way to restore connectivity to this point.
-
- I don't do gateway monitoring due to various bugs we have encountered with dpinger to this point. We do have smokeping setup, which does not register any sort of spikes to the firewall in question.
-
- Everything stops working. Also VLAN tagging stops working as I mentioned before.
-
- On this particular connection we have a leased fiber with an ethernet circuit on it(so L2 connection), however I have also seen this happen when connected to one of our own switches at another customer. On that connection we have "dark fiber" (so L1 connection) between the switch and our datacenter.
-
-
Thanks @xciter327 - a couple additional questions came to mind:
- Have you checked whether a BIOS or firmware update might address this issue?
- What power management settings have you configured in pfSense?
- When the system locks up, have you tried shutting the system down completely including unplugging the power? Then plug power back and and restart system. Does that have any impact on how long it stays up or whether it still crashes?
Thanks again - hope this helps.
-
- Yes. Latest BIOS is installed. Firmware for card is not updatable via the Intel utility as far as I see, however I have implemented the "-WOLD" flag successfully and have not had any issues since. Still under testing. On another place where I have 2 HA units I've setup "-WOLD" on on, but not on the other for some A/B testing.
- PowerD is enabled and set to "Maximum" for all three options.
- No I have not. I just do a "reset" via IPMI. What benefit do You thing fully power cycling the system will give?
-
@xciter327 said in Firewall looses L2/L3 connection, VLAN tagging - Intel igb driver:
- Yes. Latest BIOS is installed. Firmware for card is not updatable via the Intel utility as far as I see, however I have implemented the "-WOLD" flag successfully and have not had any issues since. Still under testing. On another place where I have 2 HA units I've setup "-WOLD" on on, but not on the other for some A/B testing.
- PowerD is enabled and set to "Maximum" for all three options.
- No I have not. I just do a "reset" via IPMI. What benefit do You thing fully power cycling the system will give?
Hi @xciter327 - Regarding 3. I had an interesting situation on my Supermicro system where an SFP+ port would stop working and wouldn't start working again until I shut the system down completely, removed all power, and started it back up. I thought it could be interesting to try in case a complete shutdown resets something that may be impacting the behavior that you are observing.
Hope this helps.
-
Mmm, I've certainly seen ix ports get stuck in a mode that survives a reboot. Only a complete power cycle cleared it.
Of course that doesn't explain why it fails initially.
Steve
-
@tman222 said in Firewall looses L2/L3 connection, VLAN tagging - Intel igb driver:
@xciter327 said in Firewall looses L2/L3 connection, VLAN tagging - Intel igb driver:
- Yes. Latest BIOS is installed. Firmware for card is not updatable via the Intel utility as far as I see, however I have implemented the "-WOLD" flag successfully and have not had any issues since. Still under testing. On another place where I have 2 HA units I've setup "-WOLD" on on, but not on the other for some A/B testing.
- PowerD is enabled and set to "Maximum" for all three options.
- No I have not. I just do a "reset" via IPMI. What benefit do You thing fully power cycling the system will give?
Hi @xciter327 - Regarding 3. I had an interesting situation on my Supermicro system where an SFP+ port would stop working and wouldn't start working again until I shut the system down completely, removed all power, and started it back up. I thought it could be interesting to try in case a complete shutdown resets something that may be impacting the behavior that you are observing.
Hope this helps.
We have not had any issues of the sort with the Atom board. I'm having some issues with an Intel X710 adapters, but those are very obviously driver related.
On topic, I have the box no in the office and I'll try to reproduce the issues following another round of memtests. I have a sneaking suspicion that it might be something related to the amount of interrupts that are generated by the igb driver. I wonder if it would be possible to be hitting the limit that is mentioned in the documentation for ix network adapters:
hw.intr_storm_threshold=1000(default) is suggested to be raised to 10k. I've seen the IGB driver generating about 7-8k on a utilized gigabit link(per interface that is). Overall on igb systems all the CPU power is usually hogged up with igb's by the looks of it. -
Just wanted to mention the problematic box has passed roughly 6 days of memtesting without errors. I'll probably script some flent tests to run as a next step.
-
Have not had time to script tests yet. One of the 2 brand new boxes with same hardware and "WOL" disabled froze a couple of days ago as well. The previous box's console was still interactive when issue happened. This one was a full freeze. Not reacting to any inputs.