In Errors in one interface vlan
-
Have a look in the sysctl stats for ixl3 to see what sort of errors those are. The error value shown there in 22.01+ is a sum of various different error types.
At the command line run:sysctl dev.ixl.3
Look for values like:
dev.ixl.0.mac.checksum_errors: 0 dev.ixl.0.mac.rx_length_errors: 0 dev.ixl.0.mac.remote_faults: 2 dev.ixl.0.mac.local_faults: 1 dev.ixl.0.mac.crc_errors: 0 dev.ixl.0.pf.rx_errors: 0
What is ixl3 connected to? How is it connected?
Steve
-
@stephenw10 I saw that on documentation but didnt tried at the time.
Here is the command output for that interface:
dev.ixl.3.pf.rx_errors: 151964 dev.ixl.3.mac.crc_errors: 0 dev.ixl.3.mac.rx_length_errors: 150725 dev.ixl.3.mac.checksum_errors: 1239 dev.ixl.3.mac.remote_faults: 3 dev.ixl.3.mac.local_faults: 6
I have that interface connected to a switch, through a cable of fiber optic OM3. From each end there are SFP's transceivers, but not original intel SFP's, those are 10GTEK SFP brand, with a firmware for intel (aka white brand). I've other SFP's and the other the secundary firewall to test. I will let you know.
-
Almost all rx_length_errors then. So possibly something sending packets with a bad header length set?
Swapping cables/ports is always a good test though. -
@stephenw10 With the secundary firewall, didnt had any packed loss for 4 days in arrow, but after that I got periods with lots of it, to the point were is totaly offline (periods of around 10 minutes). The secundary firewall uses other physical connection (same SFPs and fiber type), and its connected to another switch. There are also just in errors on the same network, and after I ran that command, I got lots of rx_length_errors.
Both have those SFP's white branded with the same firmware for intel transceivers, but on the other edge I have other from the same brand with HPE firmware. From HPE switchs there are no errors detected.
I've that device with packed loss issues on the same vlan 1 untagged, together with my watchdog that detects that (under the same switch). Having a packet loss, is not an issue that its related with switchs side (where I have all the devices connected), and not with the firewall it self? Could those rx_length_errors not being related with this packet loss of that device? If you need more info just let me know.
-
Input errors like that would be unlikely to cause a complete disconnect unless they are showing as a far higher percentage of the total packets. It could certainly cause packet loss though.
What's the difference between these two sites though? The switches are different? SFP modules different?
This looks like a pretty low level issue. I'm not sure pfSense can do anything about it directly.
Steve
-
@stephenw10 but packet loss between devices on the same network, connected to the same switch?
This is the network layout:
SFP modules are the same per device type, brand and model.
Switch LAN A is equal to Switch LAN B, and Router A1 WAN1 is equal to Router A2 WAN2, in terms of models.
Right now, I dont have one fiber cable on that trunk connected, but dont think that it could be from that. Each port is running at 10Gbps (with two it will be 20Gbps), as well as others, except at client level such as those ESXI's and the other device with packet loss.
-
So by removing one link in a LAG between the switches the packet loss and errors go away?
That is almost certainly a switch config issue if so. Something that is carrying is creating a conflict or loop somehow.
Steve
-
@stephenw10 No, I simply didnt connect it till my last post, since it was happening with those two links as trunk, I tried to use only one as a test, but no differences. Now I've both links connected under the same trunk.
That device were I am having those issues, still continues with the same behaviour.
It must be a switch side issue, but there are no errors, on both units. I've even created a new thread on HPE Aruba community to see if anyone could help me.
-
Your diagram show igb NICs is that a mistake? They should be ixl?
Do you have multiple VLANs on those NICs? But you're only seeing packet loss on ixl3.11?
Steve
-
@stephenw10 yes, it was supposed to be IXL interfaces, but I used IGB's present on my previous firewall when I made that diagram last year.
I have several tagged vlans on IXL3 interface, and the one were I've noticed those issues was the IXL3 it self with vlan1 untagged. That device with packet loss is there, and the other one with latency issues was also there, when I used primary firewall. Even on firewalls I have those In Errors on that network.
I think that I may have a loop within switchs, not a physical connection loop, but something between switchs. I dont have a loop between all switchs in this infrastruture but all the symptoms looks like there is (at least) one somewhere. As a test I will try to use loop protection and after that, maybe Spanning Tree Protocol, on Switch A1 to check if it solves.
-
Yup, that would be a good test.
-
@stephenw10 I will let you know, thank you for the help!
P.S. I've just notice now that I've put the wrong symbol on diagram switchs lol.