SG-3100 switch weird behavior (resolved)
-
@mcury yeah you shouldn't be seeing those. Hmmmm Even if your nic was in promiscuous mode, that mac shouldn't be sent down the port where the mac is not listed.
If you had some sort of leak or bridge where the mac was being learned on multiple interfaces that could happen..
So proper destination mac is your down the trunk (lan4) to the flex mini. But pfsense is also sending it out lan1? But the only place the mac of that pi4 should be seen by pfsense is the lan4 interface, it should never send that mac out lan1, unless there was bridge setup.
hmmmm strange...
This a good question for @stephenw10 he would know way more than me on the inter workings of the switch in the 3100. But typically a switch would only send traffic down the interface that the mac is on.
-
@johnpoz Exactly, its so weird, that packet should never go to pfsense's LAN1 ..
I'll try to fix it tonight by reinstalling my pfsense from scratch..
Then, if the problem happens again, I'll replace this switch.. -
Yeah, it's a pretty basic switch and there's no control over things like the MAC table. That's the only thing I could imagine causing that though.
If you haven't already try power cycling the 3100 entirely. That should completely reset the switch if it's somehow managed to toggle some flag.
Steve
-
@stephenw10 hm, I'll try it now a shutdown, remove the power cable, one sec, let me see who is here using the Internet
-
Done, the problem persists..
- Halt system and once the shutdown process ended, removed the power cable for a few seconds.
-
stephenw10 Netgate Administratorlast edited by stephenw10 Oct 18, 2022, 4:18 PM Oct 18, 2022, 4:17 PM
Hmm, the only other thing I could imagine causing this is if something feeding bad data into the switch MAC table. That would have to be the desktop machine.
If you run a continuous ping from the RasPi to somewhere that has to be accessed through the 3100 switch, does that prevent the issue?
If it does I'd try to find something sending the RasPi MAC from the desktop. Hard to say what that might be.... something reflected perhaps?
If you run a pcap on the desktop and filter by the RasPi MAC address whilst the problem is not happening and wait for it to start. The first thing that happens there might be the offending packet.
Steve
-
@stephenw10 said in SG-3100 switch weird behavior:
If you run a continuous ping from the RasPi to somewhere that has to be accessed through the 3100 switch, does that prevent the issue?
Testing now, ping is running from RPI4 to pfsense.
It seems to have stopped, but it may start again soon, so I'll wait a little longer this time.Packet capture set:
🔒 Log in to viewEdit:
This is my ARP table (desktop)
$ cat /proc/net/arp IP address HW type Flags HW address Mask Device 192.168.255.252 0x1 0x2 00:11:32:9f:ee:93 * enp7s0 192.168.255.249 0x1 0x2 00:08:a2:0c:c4:1c * enp7s0 192.168.255.250 0x1 0x2 b8:27:eb:ea:f8:65 * enp7s0 192.168.255.253 0x1 0x2 dc:a6:32:a5:47:19 * enp7s0
-
40 minutes pinging from raspberry pi 4b (192.168.255.253) to pfsense (192.168.255.249) and no problem so far.
I have two wireshark windows opened, one monitoring:
eth.src == dc:a6:32:a5:47:19 and not tcp.port == 22 and not tcp.port == 9000And the second one monitoring:
ip.addr == 192.168.255.253 and not tcp.port == 9000 and not tcp.port == 22 -
Dropped the ping and one minute later (or less), the problem starts again:
desktop ARP table:
$ cat /proc/net/arp
IP address HW type Flags HW address Mask Device
192.168.255.252 0x1 0x2 00:11:32:9f:ee:93 * enp7s0
192.168.255.249 0x1 0x2 00:08:a2:0c:c4:1c * enp7s0
192.168.255.250 0x1 0x2 b8:27:eb:ea:f8:65 * enp7s0
192.168.255.253 0x1 0x2 dc:a6:32:a5:47:19 * enp7s0 -
Hmm, so nothing from the RasPi MAC address at the desktop that might be inserting invalid entries into the switch.
It might be worth re-running that test using the RasPi MAC as destination in the filter (or as either).
You might catch something arriving using that but a different IP address.Also when this happens do you see traffic being sent only to the desktop? Or is the syslog traffic sent to all the 3100 switch ports? Does it also arrive at the RasPi?
Steve
-
@stephenw10 It seems that its only going to LAN1..
raspberry pi 3 in which you see the tcpdump above is connected to the switch unifi mini.
Let me perform this test again, but in the NAS which is connected to LAN2 of pfsense, one sec.
-
Hmmm, its going to port LAN2 of pfsense too:
NAS IP is 192.168.255.252 (tcpdump) (LAN2 of pfsense)On the right, wireshark running on desktop (LAN1 of pfsense)
-
Aha, interesting. You wouldn't expect so see it on one of the other Unifi swtch ports because it should only send it out of the port that MAC is connected to. So to the RasPi4 there.
The same should be true of the switch in the 3100 The fact it seems to be sending it to all ports implies that it no longer has a an entry for the MAC address in it's table. If it was an incorrect entry as I speculated earlier then it would only send from port 1.
Because that traffic is UDP with no replies it never sees any traffic from the RasPi4 to repopulate the table. Is the RasPi configured with a static IP?It seems unexpected that the table entry has expired though. How long does it take to fail after sending some pings approximately?
Steve
-
@stephenw10 said in SG-3100 switch weird behavior:
How long does it take to fail after sending some pings approximately?
the default cache in pfsense is like 20 minutes, but maybe not for the switch mac table? Is there anyway to view the switches mac address table?
-
@stephenw10 said in SG-3100 switch weird behavior:
Because that traffic is UDP with no replies it never sees any traffic from the RasPi4 to repopulate the table. Is the RasPi configured with a static IP?
raspberry pi 4b its on dhcp, no services running on it, only graylog, which means that the device only receives UDP data.
It seems unexpected that the table entry has expired though. How long does it take to fail after sending some pings approximately?
I'll try to get that info right now.
-
@johnpoz said in SG-3100 switch weird behavior:
the default cache in pfsense is like 20 minutes, but maybe not for the switch mac table? Is there anyway to view the switches mac address table?
I'm really missing my old Cisco days, show mac-address table vlan x :)
-
@mcury hehe - yeah would be easy to see then.. Why I like my routers with interfaces, leave the switch ports to the actual switches ;)
-
@johnpoz said in SG-3100 switch weird behavior:
@mcury hehe - yeah would be easy to see then.. Why I like my routers with interfaces, leave the switch ports to the actual switches ;)
:) Yes, you have a point there ehhe
tcpdump in pfsense during the DHCP negotiation with raspberry pi 4b
-
Yeah the pfSense ARP cache expiry time is completely independent of the switch MAC table. I don't believe there's any way to query the switch IC for the table or for the expiry time.
Steve
-
@stephenw10 said in SG-3100 switch weird behavior:
Yeah the pfSense ARP cache expiry time is completely independent of the switch MAC table. I don't believe there's any way to query the switch IC for the table or for the expiry time.
Steve
3 minutes exactly.
-
I recorded, not sure if its going to be useful..
arp_problem.zip -
Hmm, well that seems very precise. Unlikely to be random then. The Marvell 88E6141 has a 2048 address MAC table. I'm going to assume you don't have >2000 devices!
I guess it's feasible something could be generating random MAC continually and filling the table. You would see that in a pcap though.
I can't find a value for a default expiry time. I'm not sure why it would expire at all.
If the table were being reset I might imagine something else would be reset too. Are you seeing any other traffic interrupted at the 3min mark?Also can you confirm this is just unexpected, it's not actually failing to pass any traffic?
Steve
-
@stephenw10 said in SG-3100 switch weird behavior:
Hmm, well that seems very precise. Unlikely to be random then. The Marvell 88E6141 has a 2048 address MAC table. I'm going to assume you don't have >2000 devices!
Not even close.. 25 approximately..
If the table were being reset I might imagine something else would be reset too. Are you seeing any other traffic interrupted at the 3min mark?
No, everything is normal, I noticed it during a packet capture to check something else, otherwise I wouldn't even notice it..
Also can you confirm this is just unexpected, it's not actually failing to pass any traffic?
No, my network is running perfectly, the only issue is this, it seems that the marvell switch spams the packets to everyone since the mac table expired..
I'm thinking here, maybe install something in the Raspberry Pi 4b to force it to use the internet every 3 minutes?
Maybe a cron to run a single ping command? -
Yes, that would correct it. Doesn't have to be to something external, it just needs to hit the switch in the 3100.
You could set the ARP timeout in pfSense to <3mins. That way pfSense will ARP for the RasPi when it times out and the RasPi will respond refilling the switch table.
That's an easy test:[22.11-DEVELOPMENT][admin@3100.stevew.lan]/root: sysctl net.link.ether.inet.max_age=120 net.link.ether.inet.max_age: 1200 -> 120
Steve
-
@stephenw10 said in SG-3100 switch weird behavior:
sysctl net.link.ether.inet.max_age=120
done:
[22.05-RELEASE][root@pfsense.home.arpa]/root: sysctl net.link.ether.inet.max_age=120 net.link.ether.inet.max_age: 1200 -> 120 [22.05-RELEASE][root@pfsense.home.arpa]/root:
-
Strange, sometimes it doesn't take 3 minutes for the problem to happen.
I'm not sure if the Marvell mac address is really expiring, or if the problem is something else..I reverted the change, to 1200 and configured a cron job in rpi4
*/1 * * * * /usr/bin/ping 192.168.255.249
Lets see how that goes..
-
stephenw10 Netgate Administratorlast edited by stephenw10 Oct 18, 2022, 9:12 PM Oct 18, 2022, 9:11 PM
I expect that to solve it.
This has been interesting, I've never had to look into it too closely before. I can't find a specific value for the 3100 switch but for the switch in the 7100, which is from the same family of devices, the default MAC address aging time is 300s (5 mins). That has a larger table size so 3mins for the 3100 doesn't seem that unreasonable.
I suspect this might be simply the traffic pattern you have to the RasPi4. The fact it's mostly UDP where it never sends a reply.
Steve
-
@stephenw10 I believe that is the issue..
My cron is smashing pfsense with pings, let me change that cron to 1 minute -
This seems to be enough..
* * * * * /usr/bin/ping 192.168.255.249 -c 2
-
This solved the problem, 13 minutes of cron job running, no more problems..
Really thanks for the help @stephenw10 and @johnpoz :)
-
Cool. I think I prefer reducing the ARP timeout as a solution. You might try setting that to 1 min and see if that also solves it. That's just a system tunable in pfSense, all in the config.
But either will work fine.Steve
-
@stephenw10 said in SG-3100 switch weird behavior (resolved):
Cool. I think I prefer reducing the ARP timeout as a solution.
Wouldn't that change the behavior for everything? Like a global setting?
This ARP timeout would only be triggered in case of a host is not "alive" like the raspberry pi 4b we just observed ?
-
It would be global but 1min is not that unusual. I believe Windows uses 30s.
What would happen is that every minute the RasPi4 entry in the pfSense ARP table would time out. So in order to send syslog traffic to it it will ARP for the IP address and the RasPi4 will respond to that refreshing the MAC table in the switch.
It feels like a cleaner solution to me but if the ping is working for you then there then there no need to change it. It would be interesting to know that works if you're able to test it.Steve
-
@stephenw10 said in SG-3100 switch weird behavior (resolved):
It would be interesting to know that works if you're able to test it.
Sure, I'll disable the cron job and test it right now, wireshark is already running, one sec
-
johnpoz LAYER 8 Global Moderatorlast edited by johnpoz Oct 18, 2022, 10:21 PM Oct 18, 2022, 10:14 PM
@stephenw10 said in SG-3100 switch weird behavior (resolved):
I believe Windows uses 30s.
windows uses a weird way of doing it, they use 30 seconds as base and then add a random multiplier on it.. But you can adjust it if you want.
Wouldn't setting a static arp for the rpi4 also solve it? Or that is different than the switch arp cache?
I wish arp in windows actually showed you what was left on the cache, like freebsd, linux should do that too. At least in linux you can use ip -statistics neigh
I don't know of any way to actually view how much time is left on mac that is cached.
-
@johnpoz said in SG-3100 switch weird behavior (resolved):
Wouldn't setting a static arp for the rpi4 also solve it? Or that is different than the switch arp cache?
I tried that, actually the static ARP is set right now..
-
-
@mcury so what is better solution? More arps going out for everything ;) Or just pinging pfsense from the rpi4 every minute or 2 minutes..
Weird one for sure..
-
@johnpoz said in SG-3100 switch weird behavior (resolved):
so what is better solution? More arps going out for everything ;) Or just pinging pfsense from the rpi4 every minute or 2 minutes..
Weird one for sure..ehhe, that is weird indeed.
I'm not sure how it works, but it seems that every packet that goes through the switch reset that ARP timer, so the firewall wouldn't need to broadcast it as often. -
@johnpoz said in SG-3100 switch weird behavior (resolved):
Wouldn't setting a static arp for the rpi4 also solve it? Or that is different than the switch arp cache?
Yeah, nothing to do with the switch MAC table. That exists only in the switch IC.
So in fact I would expect setting a static ARP to make this worse because it will never expire, pfSense will never ARP for the IP so no responses will be generated.
So if it was still set static I'm surprised that max_age value made any difference.