Vlan irresponsive for seconds on occasion
-
Hi there,
I have a pfSense box with 4 Intel LANs (Qotom 330G4). My setup is like:
ISP modem(bridge mode) <-> igb0 pfsense <-> igb2.VLANs <-> VLAN switch <-> devices
Also, I setup igb1 on pfSense as a LAN with anti-lockout for debug support (nothing normally connect to that port).
Configuration:
pfSense 2.4.45-release-p1 (it was 2.4.45-release until yesterday with same misbehavior) igb0 dhcp igb1 192.168.99.1 Vlan 20 - trusted network Vlan 30 - IoT devices Vlan 40 - Guests network (wifi connected to the swtich)
All work as intended, IPs, separation, Rules. However, from time to time I lose connectivity for something between 60 to 120 seconds (rough numbers). By "connectivity loss" I mean: cannot ping pfSense IP address of that vlan. For instance, Vlan20 has an IP of 192.168.20.1 and my PC 20.107. I can ping 20.1 from my PC, suddenly it stops to ping for 60 seconds or so, and come back. Same happens to all Vlans.
To properly test it, I removed my PC from Wireless (trusted vlan) to mitigate the possibility of wireless AP misbehaving; plugged the PC to a Vlan 20 access port on the switch. Same issue! From time to time, I cannot ping pfSense box for a while. To have a measurement, I sent pings (once a second) over about 8 hours:
Thu Jun 11 00:30:00 PDT 2020 PING 192.168.20.1 (192.168.20.1) 56(84) bytes of data. --- 192.168.20.1 ping statistics --- 30000 packets transmitted, 28785 received, 4% packet loss, time 30067951ms rtt min/avg/max/mdev = 0.541/6.673/3308.251/30.534 ms, pipe 4 Thu Jun 11 08:51:15 PDT 2020
On same Vlan, I have other PC, so I ping during the same period:
Thu Jun 11 00:30:00 PDT 2020 PING 192.168.20.254 (192.168.20.254) 56(84) bytes of data. --- 192.168.20.254 ping statistics --- 30000 packets transmitted, 30000 received, 0% packet loss, time 30040097ms rtt min/avg/max/mdev = 0.726/6.598/3648.940/32.813 ms, pipe 4 Thu Jun 11 08:50:40 PDT 2020
Notice how I lost 4% of the packets to pfSense over 8 hours span and no packet loss to the other PC in the same vlan.
On pfSense side, there is no errors nor re-transmissions on vlan interface, there are no logs in general, DHCP, DNS resolver or firewall that seems directly associate with this issue. On the switch logs I cannot find any "link up/downs" that would denote failures in the connection. Also tried another ethernet cable just in case.
Further looking into this, I attached a cable to the LAN port on pfsense and registered no packet loss pinging it for 1 hour. Notice that while vlan has loss in its connectivity from time to time, LAN does not:
pinging pfSense vlan (igb2.20)Thu Jun 11 12:37:00 PDT 2020 PING 192.168.20.1 (192.168.20.1) from 192.168.20.107 wlp1s0: 56(84) bytes of data. --- 192.168.20.1 ping statistics --- 3600 packets transmitted, 3529 received, 1% packet loss, time 3605064ms rtt min/avg/max/mdev = 0.612/7.330/1026.348/24.878 ms, pipe 2 Thu Jun 11 13:37:07 PDT 2020
pinging pfSense lan (igb1):
Thu Jun 11 12:37:00 PDT 2020 PING 192.168.99.1 (192.168.99.1) 56(84) bytes of data. --- 192.168.99.1 ping statistics --- 3600 packets transmitted, 3600 received, 0% packet loss, time 3676976ms rtt min/avg/max/mdev = 0.113/0.451/4.052/0.094 ms Thu Jun 11 13:38:17 PDT 2020
I already tried to reinstall pfSense from the scratch and redo all the configuration by hand. No good. Since intra vlan traffic is not losing packets, I think the switch is not the issue here, checked configuration twice on it two just in case. Nothing seems wrong there too.
This is a residential LAN, 15 or so physical devices and some 3 or 4 virtual machines. Nothing too fancy nor traffic demanding. pfSense CPU was not seen above 5%, no snort, no pfblockerNG or other packets installed yet.
Does anyone has any clue? I don't know what to look for anymore. Seems to be somewhat related to vlan configuration, because lan does not suffer from this issue. Hoever, I checked vlan configs, redid it from scratch, followed documentation and nothing seems to help. What am I overlooking here?
Thanks a lot,
Jose -
After lots of debug and hours lost, that turned out to be some incompatibility between the NIC card on the computer I was running those tests, and the NIC on PfSense box (Intel). Probably defective card but only failing when talking to Intel Nic on PfSense (weird).
Changed NIC to a new one and problem is gone.