Vlan routing question… vlan traffic drops

Disturbed1

good day all, been dealing with a little glitch and not 100% sure where to start poking….

i have installed 5 AP's at a clients for public wifi...

each ap is connected to a wireless modem aimed back at my tower...

each modem is configured in bridge mode, modem management is set to vlan3, the lan port is configured to vlan8...

i have tried two different methods of configuring my pfsense gateway, first was to share my pfsense lan interface but had problem with captive portal, so since then i installed a second nic, configured it with 172.16.1.1/32, services off (dhcp,etc) and moved all my vlans to this new interface...

wan/static/spoofed MAC (00:30:48:b3:04:d6)
lan/10.212.101.1/24 primary nic (all wireless clients get net access from) with nat 1:1 xxx.xxx.236.0/24 = 10.212.101.0/24 (mac00:30:48:b3:04:d7)

opt2/172.16.1.1/32 second nic spoofed MAC(00:30:48:b3:04:d8)
vlan3/172.16.30.0/24 modem management
vlan4/172.16.10.0/24 my office POS
vlan8/172.16.80.0/22 speedway wifi setup (have had up to 361 wifi ip's assigned and about 50 people connected during race events)
vlan9/172.16.90.0/28 cisco switches

everything connects fine for awhile, give about 10 hours then all vlan traffic stops... only way i can get services back up to vlans is reboot pfsense...
all lan traffic remains fine during this time...

all firewall rules have been temporally set to open/admit all to test

this is the only thing showing up in my system log....

00:27:2234:f9 = ubiquiti Unifi AP
8c:7b:9d:4b:cb:6d = wifi client
50:ea:d6:4f:9f:27 = wifi client
d4:20:6d:7a:df:25 = wifi client
28:e0:2c:d0:55:b6 = Wifi client
8c:58:77:1f:ca:4e = wifi client

Sep 15 08:31:38	kernel: arp: 172.16.82.202 moved from 00:27:22:de:34:f9 to 28:e0:2c:d0:55:b6 on re1_vlan8
*Sep 15 08:18:25	kernel: arp: 172.16.82.145 moved from 00:27:22:de:34:f9 to 8c:7b:9d:4b:cb:6d on re1_vlan8
*Sep 15 08:13:22	kernel: arp: 172.16.82.145 moved from 8c:7b:9d:4b:cb:6d to 00:27:22:de:34:f9 on re1_vlan8
Sep 15 08:04:03	kernel: arp: 172.16.82.75 moved from 00:27:22:de:34:f9 to d4:20:6d:7a:df:25 on re1_vlan8
Sep 15 07:37:31	kernel: arp: 172.16.80.3 moved from 00:27:22:de:34:f9 to 50:ea:d6:4f:9f:27 on re1_vlan8
Sep 15 07:28:30	check_reload_status: Reloading filter
Sep 15 07:28:27	check_reload_status: Syncing firewall
Sep 15 07:28:14	kernel: arp: 172.16.80.3 moved from 50:ea:d6:4f:9f:27 to 00:27:22:de:34:f9 on re1_vlan8
Sep 15 07:27:26	check_reload_status: Reloading filter
Sep 15 07:27:26	check_reload_status: Syncing firewall
Sep 15 07:24:29	syslogd: kernel boot file is /boot/kernel/kernel
Sep 15 07:24:29	syslogd: exiting on signal 15
Sep 15 07:24:29	check_reload_status: Syncing firewall
Sep 15 07:21:10	kernel: arp: 172.16.82.183 moved from 8c:58:77:1f:ca:4e to 00:27:22:de:34:f9 on re1_vlan8
Sep 15 07:21:05	kernel: arp: 172.16.82.183 moved from 00:27:22:de:34:f9 to 8c:58:77:1f:ca:4e on re1_vlan8

when i look at rrd system processor, the spikes seem to match with the times the vlans drop….

i'm thinking it's something to do with pfsense config or hardware but not sure where to poke...
so just to try something out, i have disabled CP and moved vlans back to lan interface at 7:15am and rebooted gateway, as u see on rrd graph the activity has dropped back down...

so needless to say... i'm a little stumped here....

status_rrd_graph_img1.png_thumb

status_rrd_graph_img2.png_thumb

cmb

Looks like you have multiple issues. One you have several IP conflicts there from the "ARP moved" logs, switching between Ubiquiti and Apple MACs, between Ubiquiti and HTC MACs, and others.

The increased CPU usage is just a symptom of some other problem is my guess. What do your traffic graphs look like at those times? Rebooting can temporarily clear up so many problems internal to your network that it's not necessarily indicative of a firewall problem. An IP conflict on your gateway IP would be cleared up by a reboot temporarily, amongst other possible issues. What a packet capture on the parent interface of the VLANs shows when it isn't working would be telling.