ARP reports bogons
-
@johnpoz sure, I understand that arpwatch is just reporting bogons, but it still shows me that something is floating around on 192.168.1.112 which is not a subnet I use. So why is it even there?
It is DHCP, clients loose their IP, and I cannot renew.
But this can happen like 3 time within 10 minutes. Why would the client even be asking for 3 leases in 10 minutes?
Something just doesn't add up.
-
@deanfourie said in ARP reports bogons:
asking for 3 leases in 10 minutes?
because it didn't actually renew?
As a client gets closer and closer to when it expires - it will ask more and more frequently. When it fails then it would send a discover, etc.
I would suggest you troubleshoot a specific client that is failing - what exactly is failing? Do you see your lease over the 50% mark, if you do this is example of it not renewing for some reason.. A dhcp client gets a lease, and then would try and renew at the 50% mark, if you see a lease that is older than 50% ish of the lease, something is wrong with the renew process.
I would suggest you troubleshoot with a client that allows you to see such info, etc.. Like a windows box.
example.. Here just switched my pc to dhcp, see the lease is for 4 days..
If its past say 2:15PM on the 12th and still have this same lease - something isn't right because it should of renewed right around the 50% mark, so if its say 3pm on 12 and still showing this same lease - something is wrong for sure..
As to that device with 112 address. What is that device? I don't see 02:2d:1e as a lookup, could be a private mac - wireless tablets and phones and depending on the OS etc of the device can use random mac addresses, etc. But track down how that device is connected to your network - what switch port is it on? Is it connected to your AP, etc.
Again devices can send out that info - say I take my phone and connected to network X, and then moves to your network it could send out a probe or gratuitous with its old IP, a device can try and reuse the IP it had on a previous network, what should happen is your dhcp server should send a NAK, and then the client should send out a discover. And get an IP from your dhcp server, etc. Seeing such entries could also suggest as mentioned already another dhcp server on your network handing out IPs in that range..
That c4:9d:ed mac is listed as Microsoft.
-
@johnpoz also keep getting these in my logs.
Oct 12 00:05:59 nginx 2022/10/12 00:05:59 [error] 92058#100543: send() failed (54: Connection reset by peer)
and these
Oct 12 00:07:00 php 18242 servicewatchdog_cron.php: Service Watchdog detected service suricata stopped. Restarting suricata (Suricata IDS/IPS Daemon)
-
@deanfourie said in ARP reports bogons:
@johnpoz also keep getting these in my logs.
Oct 12 00:05:59 nginx 2022/10/12 00:05:59 [error] 92058#100543: send() failed (54: Connection reset by peer)
and these
Oct 12 00:07:00 php 18242 servicewatchdog_cron.php: Service Watchdog detected service suricata stopped. Restarting suricata (Suricata IDS/IPS Daemon)
Never put Suricata or Snort under Service Watchdog! Service Watchdog does not understand how the two IDS/IPS packages operate and it will restart them needlessly-- sometimes resulting in multiple duplicate instances running on the same interface.
I am the developer/maintainer of both IDS/IPS packages, so I know what I am talking about ...
.
While it may not be related to your immediate issue, you should never configure the IDS/IPS packages to be monitored by Service Watchdog.
-
@bmeeks Ok thanks,
I removed Suricata just to eliminate it as being an issue.
This did not resolve the issue. Last night I had the same problems, multiple interruptions and disconnects at around midnight.
I cleared all pfSense logs, and the ONLY thing I can see that occurs during this time is that in my system logs it reports the bogon at 0.0.0.0.
Oct 12 00:50:17 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 00:50:18 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 00:50:18 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 00:50:33 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 00:50:34 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 00:50:35 arpwatch 45709 bogon 0.0.0.0 c4:9d:ed:89:ed:05 Oct 12 01:00:00 php 22959 [pfBlockerNG] Starting cron process. Oct 12 01:00:06 php 22959 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Oct 12 01:25:11 arpwatch 45709 bogon 0.0.0.0 c0:33:5e:31:9e:87 Oct 12 01:25:12 arpwatch 45709 bogon 0.0.0.0 c0:33:5e:31:9e:87 Oct 12 01:25:13 arpwatch 45709 bogon 0.0.0.0 c0:33:5e:31:9e:87
One thing I don't understand is, say a user asks where the gateway is, who has 192.168.1.1, and the response is 192.168.1.1 is at 0.0.0.0 isnt this going to cause the exact issues im having?
I really suspect this is what is happening.
Your thoughts?
-
the response is 192.168.1.1 is at 0.0.0.0
that is not what is happening... Where are you seeing that 192.168.1.1 is at 0.0.0.0??
There is no way you can tell that from what arpwatch is saying there.. Lets see the sniff of these arps.
I would really suggest you just turn off arpwatch, or tell it not to log bogon - what is in your arpwatch db? It only seems to be confusing the issue for you.
My db is very small because I only had arpwatch on for short time to catch that it was showing bogon on my network as well
Lets see your pfsense arp table..
Here is snip of mine
Are you not seeing IPs in there you should see, do you see anything listed at 0.0.0.0 are you seeing any incompletes? for something?
so I pinged something from pfsense that doesn't exist, see how it shows incomplete in the arp table, because it arped for that IP and go not response
So while you were having this problem did you actually check what was going on one of your machines. Did it not have a lease, did it not have an IP, could it not ping its gateway, could it not do a dns query?
-
@deanfourie said in ARP reports bogons:
response is 192.168.1.1 is at 0.0.0.0
That makes no sense. You might possibly see 0.0.0.0 is at (MAC address) aa:bb:cc:dd:ee:ff. But you shouldn't since nothing should be using that IP.
Or you might see 'who has <gateway IP> tell 0.0.0.0'. But the response would be <gateway IP> is at <MAC address>.The only way clients get a gateway is via DHCP.
DHCP is failing here and from everything you've said it seems most likely the cause is something in the AP.
Steve
-
I did do some testing during these periods.
I was not able to ping anything Local or remote (internet). I checked my IP settings I still had a current lease, and all IP settings were correct.Internet and ALL LAN access just abruptly stops.
Now to me, if my local ARP table was being updated to point say the gateway, or all devices to 0.0.0.0, then this is the exact behavior I would expect to see.
Isn't this exactly what ARP spoofing does? Can send traffic anywhere with ARP
-
@deanfourie said in ARP reports bogons:
I was not able to ping anything Local or remote (internet).
so you could not ping pfsense IP? 192.168.1.1? Did you look in this devices arp table? What did it show for this IP? Nothing?
Is this device wired or wireless?
-
You don't by chance have any wireless repeaters in play?
-
No, I cannot ping ANYTHING including pfSense.
No wireless repeaters at all.
I didn't check the ARP table to be fair, I have very short windows to test as it's so intermittent. I will do this next time.
I am only assuming it's ARP related because it is behaving like it is ARP related.
On top of that, arpwatch is reporting that there is a bogon at 0.0.0.0 on all the host MACs which now further leads me to think that's it's ARP related.
-
@deanfourie said in ARP reports bogons:
arpwatch is reporting that there is a bogon at 0.0.0.0 on all the host MACs
Again this a PROBE!! you posted your pcap - that is not anything reporting that its IP is 0.0.0.0 at that mac, that is a arp probe can completely normal to see.. Or it a gratuitous arp..
From your pcap
You would prob have found your problem already if you were not so obsessed with what arpwatch is reporting..
-
@johnpoz haha ok ok.
This time ill grab another capture and also check the ARP table and report back!
-
@deanfourie so all your devices are wireless? You have no wired devices at all, other then pfsense? And you pfsense is a VM, and your AP plugs into what.
A drawing of your network and what is plugged into what could be helpful in figuring out what is going on..
Can your device ping each other normally? It is quite possible on a AP to be able to do L2 isolation where clients can not actually talk to each other anyway.
You don't have a switch, and things plugged into this switch? A drawing of what is plugged into what, and what can not ping what when this happens would be very helpful in pinning down the central point that could fail and cause your problem.. If everything is wireless and you can not ping through the wireless to your pfsense that is wired, that would scream the AP, if there is switch and devices on the switch can talk to each other - again that screams AP. If wireless devices can ping each other, but can not ping stuff on the switch then that says switch, etc.
When you said your lease was fine, then its life was within the 1st 50% of your total lease time.. How long is your lease set for exactly? I believe it defaults to 2 hours, I adjusted mine to 4 days.. Because I have no need for a short lease in my setup..
Also a look at your arp table before when everything is working and when it fails would be helpful.. Normally clients have a very short lifetime on arp..
Windows is really short, like 30 seconds with a random .5 or 1.5 multiplier etc. so your looking at like a 45 second arp cache max, or like 15 seconds min.
You can adjust that..
$ netsh interface ipv4 show interface 16 Interface Local Parameters ---------------------------------------------- IfLuid : ethernet_32769 IfIndex : 16 State : connected Metric : 20 Link MTU : 1500 bytes Reachable Time : 19500 ms Base Reachable Time : 30000 ms Retransmission Interval : 1000 ms
Look in your arp table are they all showing dynamic, or stale? if your seeing stale and you talk to that IP a lot, then something really wrong with arp.
To rule out a just an arp issue, you could set a static arp for the device, can you ping it then when you have this problem, or does it still not ping?
So for example on your device that you said could not ping pfsense. If you set a static arp, and then the problem happens again, and you can still not ping your issue is just more than arp, and arp issues is just a symptom of a bigger networking issue.
Set a static arp on your device, and pfsense for your devices IP.. When it happens again if they can not talk even with static arp setup, then you have a just general complete loss of connectivity problem - and not just something dropping arp, etc.
-
@johnpoz not all wireless, some are wired.
I don't think its client isolation from the AP as when everything is working, there is no client isolation, and everything works perfectly.
I'll do some testing as well with the wired clients and see if they are experiencing the same behavior. I'm never on a wired device as it always just happens so late.
-
@deanfourie see my edit about looking at the arp cache, etc. windows devices have a really short default arp cache.. you could try setting static arp entries to see if that removes those devices from the problem or not, etc..
And again - a drawing even if on a napkin with crayon and then you snap a picture on your phone to post would all give us a clear understanding of how everything is connected, and once we know devices that are effected and devices that are not - we can pinpoint the problem.
But unless all your devices were loosing their lease, this is has nothing to do with pfsense - when you say devices can not ping each other. Pfsense is not part of the conversation, devices on the same network pfsense is not involved in their conversations.. So if device A can not pings device B that is on the same network as A - and they have IPs - then pfsense has nothing to do with this issue.
-
-
@deanfourie just caught this in my packet CAP.
22:06:09.298158 IP 0.0.0.0.68 > 255.255.255.255.67: UDP, length 340
-
@deanfourie yeah that is a dhcp discover..
So you only run dhcp on vlan 11, 10 and 12 are all set static on the devices.
And your saying all your devices loose connectivity..
Where is the routing happening at your main switch there, because you show only a transit network that /27 from pfsense to your main switch? Its black not red for trunk?
/27 when you have all of rfc1918 to use seems a bit tight. What are these other vlans IP ranges?
But looks like green your iot vlan is same as your transit?
Curious why you don't just use /24 everywhere and match up your vlan id with your 3rd octet..
like 172.16.10/24 vlan 10, 172.16.11/24 vlan 11, 172.16.12/24 vlan 12, etc.
-
@johnpoz One thing I also noticed is that it doesn't appear that the DHCP leases are renewing.
I see the lease end time of say 20:30 and at 21:30, the lease still shows a end time for the same date of 20:30, which I take it that client did not renew its lease.
Instead of displaying and up to date lease of say 8 hours from the time of the old lease end time.