Dropped packets when using pfSense, Apple laptop, and WiFi
-
On Thursday, my network became unusable, and I still don't know why.
Basic setup is: Apple Time Capsule in bridge mode, connected to semi-smart switch, connected to pfSense box LAN port (which supplies DHCP and NAT), pfSense WAN port via Comcast Business Internet.
Any attempt to go outside of LAN results in massive packet loss – over 50% via ping. I've tried two different pfSense boxes, and two different WAPs.
When I run tcpdump on the LAN port on pfSense, I see all of the ICMP requests go out.
When I run tcpdump on the WAN port on pfSense, I see all of the ICMP requests, and the replies. The replies are not being forwarded to LAN most of the time. But not all of the time.
If I plug the laptop into the WiFi box over ethernet, I experience no packet loss.
If I ping machines that are on LAN via wireless, I experience no packet loss.
If I put the WiFi into NAT/DHCP mode (and plug it directly onto the outside network), I experience no packet loss.
I'm at a total loss to explain this behaviour. Can anyone?
-
IP address conflict might explain it. Anything in the system logs?
-
No address conflicts I can see. Nothing in the system logs.
Another behaviour: I can't ping the pfSense box from the laptop over WiFi. Not reliably, I mean. But if I manually delete the arp entry for the laptop on the pfSense box, I can ping it for a few seconds, then it stops again. The arp entry is still there, with the same mac address.
-
Apple devices are notorious for playing ARP games. If you delete an ARP entry and it works then it doesn't you might have an IP conflict somewhere that pfSense cannot see.
My guess is if you get rid of the Time Capsule and only use wired it'll work fine.
That ought to narrow down where the problem is.
-
I had the Time Capsule out of the equation before I posted here :). (AmpliFi WAP now.)
I manually put in a "perm" arp entry for the laptop, and the packet loss decreased significantly. (Less than 1% over 15 minutes.) That suggests to me that something is invalidating or expiring the arp entry for some unknown reason.
I've been running tcpdump sessions, but haven't seen anything. Well, no, let me amend that a bit – before I did the perm arp entry, I was seeing a lot of who-is for my laptop on the pfSense box. As in, at least one every two seconds.
-
You have something hosed in your layer 2. It's not in pfSense (unless it's something like some static ARP or MAC entry you set that should really never need to be done.) Look elsewhere.
I've been running tcpdump sessions, but haven't seen anything. Well, no, let me amend that a bit – before I did the perm arp entry, I was seeing a lot of who-is for my laptop on the pfSense box. As in, at least one every two seconds.
You mean ARP WHO HAS? Coming from where? Look at the MAC address. Why can't it get a response? Again, that points to your layer 2 (switching) somewhere.
-
I've replaced everything but the laptop at this point, and we're seeing it with multiple laptops. (But we're a Mac-only household, so of course if it's the software that would be expected.)
I've replaced: switch, ethernet cables, pfsense box, wireless access point.
Even with the arp entry I put in, this happens:
[2.3.2-RELEASE][admin@pf.kithrup.com]/root: arp -a -n | fgrep .104
? (192.168.0.104) at (incomplete) on igb0 expired [ethernet]
[2.3.2-RELEASE][admin@pf.kithrup.com]/root: arp -a -n | fgrep .104
? (192.168.0.104) at 60:03:08:9a:3d:10 on igb0 expires in 1200 seconds [ethernet] -
If you have put in ARP entries or MAC addresses anywhere, take them out. They are unnecessary.
What device is this: 60:03:08:9a:3d:10 ??
-
tcpdump -i igb0 arp and host 192.168.0.104 and 192.168.0.254
13:57:38.293875 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:57:38.405415 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:57:42.301071 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:57:42.501395 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:57:47.327086 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:57:47.416649 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:57:52.350079 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:57:52.536667 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:57:56.370888 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:57:56.427692 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:01.396844 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:01.547955 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:05.264035 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:05.439074 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:10.443428 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:10.559356 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:14.517282 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:14.655328 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:19.294710 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:19.366364 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:23.672889 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:23.871411 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 46
13:58:28.530521 ARP, Request who-has 192.168.0.104 tell 192.168.0.254, length 28
13:58:28.581742 ARP, Reply 192.168.0.104 is-at 60:03:08:9a:3d:10, length 4660:03:08:9a:3d:10 is 192.168.0.104 aka my laptop. Those numbers are correct.
-
Problem is provisionally solved: we disabled IPv6 on WAN and LAN. (WAN was set to DHCP6, and LAN was set to Track Interface.)
Since I was never able to get IPv6 working (Comcast Business Internet), this isn't a big loss for the moment.