ARP Poisoning Symptoms on VLAN Interfaces with DHCP



  • Hello Community,

    We are seeing a weird issue where devices on the network are unable to get DHCP leases. Even if they do, they just are really random. Here are some details:

    Setup:
    2 pfSense routers in HA mode, with CARP address.
    em2    - 172.22.160.3/23 | DHCP: Yes
    em2.200 - 172.22.200.3/23 | DHCP: Yes
    Connected to:
    Turnk with NativeVLAN 160 and allowed VLAN 160,200

    Problem:
    DHCP works fine on em2 interface. However, on em2.200, we see some devices not being able to DHCP in. Basically they just timeout trying to get a lease. Devices that have a lease will either remain with them or will drop the lease and eventually time out. So, the behavior is very random.

    We see the following in system logs:
    Nov 8 09:34:08 kernel arp: 172.22.200.163 moved from c4:67:b5:32:70:fa to f8:27:93:c1:ae:1d on em2.200
    Nov 8 09:34:06 kernel arp: 172.22.201.216 moved from c4:67:b5:36:82:96 to fc:db:b3:20:48:16 on em2.200
    Nov 8 09:34:04 kernel arp: 172.22.201.211 moved from c4:67:b5:36:82:96 to 18:b4:30:55:00:f8 on em2.200
    Nov 8 09:33:59 kernel arp: 172.22.201.217 moved from c4:67:b5:36:82:96 to 18:b4:30:56:cc:ed on em2.200
    Nov 8 09:33:55 kernel arp: 172.22.201.103 moved from ac:cf:85:28:98:b5 to c4:67:b5:36:82:96 on em2.200
    Nov 8 09:33:52 kernel arp: 172.22.201.125 moved from c4:67:b5:36:82:96 to ac:37:43:51:43:ee on em2.200
    Nov 8 09:33:46 kernel arp: 172.22.201.125 moved from ac:37:43:51:43:ee to c4:67:b5:36:82:96 on em2.200
    Nov 8 09:33:42 kernel arp: 172.22.201.8 moved from c4:67:b5:36:82:96 to a4:d1:8c:67:a5:eb on em2.200
    Nov 8 09:33:37 kernel arp: 172.22.200.163 moved from c4:67:b5:36:82:96 to c4:67:b5:32:70:fa on em2.200

    tcpdump reveals that server sees and responds to the requests, but it seems like the requester never sees it? (unsure about this part, wasn't able to verify this on client side):
    4c:57:ca:67:04:5d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 4c:57:ca:67:04:5d, length 300
    00:ec:ac💿a9:d4 > 4c:57:ca:67:04:5d, ethertype IPv4 (0x0800), length 349: 172.22.200.3.67 > 172.22.200.224.68: BOOTP/DHCP, Reply, length 307
    4c:57:ca:67:04:5d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 4c:57:ca:67:04:5d, length 300
    00:ec:ac💿a9:d4 > 4c:57:ca:67:04:5d, ethertype IPv4 (0x0800), length 349: 172.22.200.3.67 > 172.22.200.224.68: BOOTP/DHCP, Reply, length 307
    4c:57:ca:67:04:5d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 4c:57:ca:67:04:5d, length 300

    Checking dhcp.leases file reveals a ton of Abandoned IPs, all of which are related to em2.200's subnet/dhcp service.

    Troubleshoot:
    These are the steps we've taken so far with no help:
    1. Moved DHCP from primary to secondary (turned off Sync - DHCP to do this). Secondary router wasn't seeing the ARP related logs prior to moving DHCP to it.
    2. Power cycled routers.
    3. Cleared ARP cache, DHCP leases.

    What throws me off is that we don't see the same issue on other interfaces that's not doing any VLAN Tagging or on em2 itself, given the fact that all networks are designed in the same fashion.
    Any help would be really appreciated.

    • dplqb -

Log in to reply