Chinese I226-V on 23.05.1, problems
-
I have two firewalls working in CARP, both are connected to one Buffalo BS-MP2008 managed switch on the LAN side and have i226-V intel cards installed that are used for both LAN and WAN, WAN side also connected to one cheap unmanaged 2.5Gbit switch. Statistics on the WAN side seems OK, no errors, no collisions. Statistics on the LAN side have errors and collisions, where error number is different for both firewalls and accepted since it was always have been, but collision number is same for both firewalls and have not been appeared with other cards I have been tried—I225-V and X550 and realtek before, no collisions were reported, only erorrs.
I replaced 225 with 226 in order to diagnose why sometimes the WAN (igc0) port state is shown as down, once a day, which causes the connection to break. The funny thing is, the connection is broken on the WAN (igc0) interface only when the suricata is running on the LAN (igc1) interface in Inline mode, dunno, maybe some pci-e controller buffer overrun, yes it looks like those card have some asmedia pcie controller on board.
The cards are, of course, a Chinese product from AliExpress, but there are no problems with it under load, like iperf testing for hours, the connection usually breaks randomly once a day or two, regardless of the load with suricata inline mode enabled. With the suricata disabled, everything can work for weeks without those disconnections.
Yes, I know that buying network cards on aliexpress is not a good idea, but I'm not entirely sure that this is the case at all.
Does anyone else using those 225/226 cards, may be embedded and have seen something similar? -
Never seen that on the i226 or i225 NICs in the Netgate devices.
When it shows as down is it actually down? Does the NIC link LED show down? The other end of the link show a disconnection?The LAN NIC still works? Even though that's the side running with netmap?
Steve
-
@stephenw10
Probably explained rather chaotically
This down/up happening in millisecond, I think, and randomly, so I did not check the LED status, this looks like mission possible, only if I put some camera to record this.
The second statement about suricata was definitely wrong, it just changed the period of those random disconnections, sometimes it is about 48+ hours, but I did not tested this for a long time.@stephenw10 said in Chinese I226-V on 23.05.1, problems:
The LAN NIC still works? Even though that's the side running with netmap?
So far, igc1 worked just fine. I am planning to do some test. Since I have two WANs and different NIC brands, I want to swap this igc0 with re0, so PPPoE moves to the re0 and will see what happening then.
BTW on the main unit I have now installed x550-T2 and running PPPoE on ix0 for more than 5 days rock stable, all other setting are the same, just reconfigured interfaces and port speeds.
-
Hmm, so what actually happens when it loses link? You see it logged I assume? Does it instantly reconnect?
-
@stephenw10
In logs, I see igc0 down event, then a bunch of other events that happening right after, like PPPoE disconnection and so on, nothing unusual, like you just pulled the cable out for a second, maybe, I don't know. The port down event is somewhere between all this mess, I believe a bit later then down even. Currently, I have no logs anymore, sorry, but I remember, it was in the same minute at least, if not the same second. -
What's it actually connected to? Can you put a switch between as a test?
-
@stephenw10
Tried 3 different switches. Currently connected to 2.5G dumb zyxel switch, tried 1G some tp-link and 2.5G tp-link. No difference at all. -
@stephenw10
Hmm....
After six days, something similar happened now on the ix0 interface on the main unitJul 24 02:10:11 kernel ix0: link state changed to UP Jul 24 02:10:11 check_reload_status 480 Linkup starting ix0 Jul 24 02:10:11 check_reload_status 480 Reloading filter Jul 24 02:10:11 php-fpm 15509 /rc.linkup: Removing static route for monitor 8.8.8.8 and adding a new route through 199.0.100.1 Jul 24 02:10:11 php-fpm 15509 /rc.linkup: Shutting down Router Advertisement daemon cleanly Jul 24 02:10:11 ppp 39147 [wan] IFACE: Set description "WAN" Jul 24 02:10:11 ppp 39147 [wan] IFACE: Rename interface pppoe0 to pppoe0 Jul 24 02:10:11 ppp 39147 [wan] IFACE: Down event Jul 24 02:10:11 check_reload_status 480 Rewriting resolv.conf Jul 24 02:10:09 php-cgi 1355 rc.kill_states: rc.kill_states: Removing states for interface pppoe0 Jul 24 02:10:09 php-cgi 1355 rc.kill_states: rc.kill_states: Removing states for IP fe80::a236:9fff:fec3:4a2c%pppoe0/32 Jul 24 02:10:09 ppp 39147 [wan] IPV6CP: LayerDown Jul 24 02:10:09 ppp 39147 [wan] error writing len 8 frame to b0: Network is down Jul 24 02:10:09 ppp 39147 [wan] IPV6CP: SendTerminateReq #2 Jul 24 02:10:09 ppp 39147 [wan] IPV6CP: state change Opened --> Closing Jul 24 02:10:09 ppp 39147 [wan] IPV6CP: Close event Jul 24 02:10:09 ppp 39147 [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address Jul 24 02:10:09 check_reload_status 480 Rewriting resolv.conf Jul 24 02:10:09 php-cgi 98827 rc.kill_states: rc.kill_states: Removing states for interface pppoe0 Jul 24 02:10:08 php-cgi 98827 rc.kill_states: rc.kill_states: Removing states for IP xx.yy.21.204/32 Jul 24 02:10:08 ppp 39147 [wan] IPCP: LayerDown Jul 24 02:10:08 ppp 39147 [wan] error writing len 8 frame to b0: Network is down Jul 24 02:10:08 ppp 39147 [wan] IPCP: SendTerminateReq #4 Jul 24 02:10:08 ppp 39147 [wan] IPCP: state change Opened --> Closing Jul 24 02:10:08 ppp 39147 [wan] IPCP: Close event Jul 24 02:10:08 ppp 39147 [wan] IFACE: Close event Jul 24 02:10:08 ppp 39147 caught fatal signal TERM Jul 24 02:10:07 php-fpm 15509 /rc.linkup: DEVD Ethernet detached event for opt1 Jul 24 02:10:07 php-fpm 15509 /rc.linkup: Hotplug event detected for ISP_LAN(opt1) dynamic IP address (4: dhcp) Jul 24 02:10:05 kernel ix0: link state changed to DOWN Jul 24 02:10:05 check_reload_status 480 Linkup starting ix0 Jul 24 01:01:46 php-cgi 74734 notify_monitor.php: Message sent to -@gmail.com OK
Those two lines in question…
Jul 24 02:10:05 kernel ix0: link state changed to DOWN Jul 24 02:10:05 check_reload_status 480 Linkup starting ix0
Since timestamp is the same…
check_reload_status I believe that happened later because on ix0 down, is not it? -
Do the Suricata logs show it restarting when that happens? In inline mode (netmap) it will bounce the link if the netmap interface is recreated. But that should be on the LAN side....
-
@stephenw10
I will clarify and draw your attention to the fact that this is my main or primary unit where igc replaced for test with ix (x550-t2) card. There is nothing in suricata logs. I do not think it's suricata or netmap. It looks more like card or driver or some kernel part failure… or i don't know what else it can be.Is there something in the FreeBSD that ix and igc can use at some level?
Now testing secondary unit igc0 under iperf 100Mbit load, port speed is 2500, switch is placed between test server and igc0 port.
-
Well the logs show the link bouncing. So either it really did bounce in which case I'd be trying to confirm that from logs in the switch. Or it was a virtual interface of some sort like netmap. The only time I have seen the link bounced by something in software (other than a NIC config change) is when using Snort or Suricata in in-line mode.
-
@stephenw10
Well, how can I disable netmap completely? -
Put Suricata in legacy mode. It uses netmap for in-line mode.
-
@stephenw10
I've removed suricata completely, but I still see
ix1: netmap queues/slots: TX 4/2048, RX 4/2048 in dmesg output for all the cards, is this normal? -
I don’t want to jump to conclusions, but at the moment there is a suspicion that there is some kind of dependence between these connection breaks and the netmap, which is built into the kernel, as I understand it, and PPPoE. I can’t imagine what kind of dependence, but if the port is used on 226 as a normal DHCP through a similar connection, as in the case of PPPoE, the link is stable.
At the moment, I have replaced both cards on both testlab firewalls with the original Intel X550-T2 running latest firmware.
Also I removed the suricata, and we will see if the situation repeats itself next month. -
@w0w said in Chinese I226-V on 23.05.1, problems:
is this normal?
Yes. The driver shows that as available queues when it attaches:
ix0: <Intel(R) X553 N (SFP+)> mem 0x80400000-0x805fffff,0x80604000-0x80607fff at device 0.0 on pci9 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 4 RX queues 4 TX queues ix0: Using MSI-X interrupts with 5 vectors ix0: allocated for 4 queues ix0: allocated for 4 rx queues ix0: Ethernet address: 00:08:a2:12:17:7e ix0: eTrack 0x8000084b PHY FW V65535 ix0: netmap queues/slots: TX 4/2048, RX 4/2048
That doesn't mean that netmap itself is in use.
-
Preliminary information on one of the firewalls — X550-T2 works without problems. If the connection is interrupted, then only from the provider, every known amount of days. Link never going down as it did with igc.
-
Hmm, so the physical link stays up and only the PPPoE is restarted?
-
@stephenw10
Yes, exactly. -
Hmm, I have no idea what would cause that on igc but not ix. Unless it's actually the igc link dropping causing PPPoE to reset. I've never seen it on any of our igc NICs but that is the reported symptom from those early igc NICs in Linux or Windows.