Captive Portal blocking white listed MAC addresses in 2.5.0

free4

Seems that no one has the same issue...

I would say that more information is needed before submitting a bug report. I'm not denying there could be some issue somewhere but we haven't located it yet.

Execute the command ipfw table all list on the using Diagnostics -> Command Prompt menu
Are you able to see the MAC addresses of any phone which is not getting internet? If not, can you see it in the ARP table ? (you can see that in Diagnostics -> ARP table)
Then go to the captive portal settings and hit "Save" on the whitelist as you mentioned. Are you you able to see the MAC address in ipfw table all list now ?
(ipfw is the technology used under the hood by the captive portal)
Are you using idle timeout or hard timeout in your captiveportal settings ?

Gertjan

The MACs that are whitelisted are in the

--- table(ZONE_pipe_mac), set(0) ---

table.

I've :

--- table(cpzone1_pipe_mac), set(0) ---
 xx:1f:a1:54:98:c9 any 2007 0 0 0
 any xx:1f:a1:54:98:c9 2006 0 0 0
 xx:3b:df:0d:ec:31 any 2001 6453564 2233274966 1615640927
 any xx:3b:df:0d:ec:31 2000 4902299 823308096 1615640927
 xx:76:35:f2:a9:0e any 2005 0 0 0
 any xx:76:35:f2:a9:0e 2004 0 0 0
 xx:8d:79:91:ec:52 any 2003 0 0 0
 any xx:8d:79:91:ec:52 2002 0 0 0

As you can see, xx:3b:df:0d:ec:31 uses a lot of traffic (a xBox on Wifi).

AndrewDuey

@free4 I will check and see what it looks like. Thanks for the troubleshooting steps. Right now I do see 2 rules for each MAC (one inbound, one outbound) but I can't confirm if the devices are working.

We are using a hard timeout in the captive portal settings (although that should be unrelated since they're white listed).

I'm hoping to replicate and grab a packet capture on Monday.

Thanks!

AndrewDuey

@andrewduey Well, users are not reporting any issues (after 3 business days) so there's nothing to troubleshoot anymore (for better or worse).

At this point it feels like in the upgrade from 2.4.4 to 2.5.0 the Captive Portal MAC allow list was transferred in the management GUI but not actually entered into the firewall allow list under the hood. When I get a free time I'm going to check some other devices and see if I can reproduce. For now it feels like the work-around of just opening and saving the entries was enough to fix things up.

If I find out anything more I'll report back. Thanks @free4 and @Gertjan !

AndrewDuey

Well, it's still happening. We found more computers that cannot access the internet (traverse the firewall specifically where captive portal is being used) who's MAC addresses are in the Services -> Captive Portal -> <<zone>> -> MACs list.

From the computer with the static IP set we could NOT ping the default gateway (which is the pfsense firewall). We could see the MAC address in our switch so we know we have the right MAC address.

I dumped the ipfw table before and after fixing it and it's identical other than the traffic counters.

To temporarily fix it, I just open that entry in the captive portal allow list, save it (without changing anything), and then pfsense starts passing traffic again.

I have screen scrapes, screen shots, and packet captures but didn't see anything.

We think we have machines that were working and then stopped for unknown reasons, but the re-save work around "fixes" them (still confirming details).

Any other thoughts??

Packet capture available upon request.

--Andrew

Gertjan

Your captive portal setting 'MACs' page should list this :

I have 4 devices listed.
Note the captive portal's 'zone' name : "cpzone1".

Now, check the back end : the actual firewall :

ipfw table all list

The tables to be checked, in my case, is "cpzone1_pipe_mac" :

....
--- table(cpzone1_pipe_mac), set(0) ---
 88:1f:a1:54:aa:c9 any 2007 0 0 0
 any 88:1f:a1:54:aa:c9 2006 0 0 0
 4c:3b:df:0d:bb:31 any 2001 967641 215030779 1616644243
 any 4c:3b:df:bb:ec:31 2000 1091854 178335895 1616643391
 7c:76:35:f2:cc:0e any 2005 0 0 0
 any 7c:76:35:f2:cc:0e 2004 0 0 0
 4c:8d:79:91:dd:52 any 2003 0 0 0
 any 4c:8d:79:91:dd:52 2002 0 0 0
....

which checks out. I've 4 pairs - each pair is a 'pipe' for the up traffic, the other for down taffic.

Why do you not show the issue ?
Like : a pipe or even a pair is missing in the table - or, at the same moment, it is shown in the GUI (my first image).

I've never seen such a thing happening.

Note : three out of four MAC pairs belong to people that do not often visit 'my site with pfSense'.
Only the guy that uses "4c:3b:df:0d:bb:31" is present, and using a lot of traffic.

AndrewDuey

@gertjan Thanks for the reply. The allow list does look similar
Mac address list before save-redacted.jpg

And from ipfw table all list (prior to hitting the save button, non-related info stripped):

[21.02-RELEASE][root@fw01]/root: ipfw table all list
--- table(cp_ifaces), set(0) ---
igb0.401 2200 9495659 9591000424 1616605109
igb0.400 2100 408159055 239208262930 1616605110
--- table(cpzone_auth_up), set(0) ---
--- table(cpzone_pipe_mac), set(0) ---
 b8:ca:3a:b4:39:59 any 2111 7401110 2503360654 1616560849
 any b8:ca:3a:b4:39:59 2110 5790721 2042888077 1616605109
--- table(public_pipe_mac), set(0) ---
 b8:ca:3a:b4:39:59 any 2135 0 0 0
 any b8:ca:3a:b4:39:59 2134 0 0 0

The ipfw table all list looks identical after hitting the save button (other than the traffic counters) but immediately after hitting save pfsense starts passing traffic.

The only odd thing is that that mac address shows up in a different, unrelated, captive portal zone and does not show up in the pfsense web GUI. I wouldn't think this would cause a problem.

I'll PM you a few more screen-shots that are complete.

Thanks for lending your eyes to this!

free4

Hello all, hello @andrewduey

First, sorry for the delay

Andrew reached me privately and gave me more details, including a packet capture.
It seems wayyyy more strange that I first anticipated.

Here is what I can tell:

When clicking save, no rule of the impacted device is being changed. In fact, there is no rule being changed before/after the save.
According to the packet capture, no traffic is arriving on the firewall from the impacted device before clicking on "save". This is strange : We should at least see some DNS queries & other "standard traffic" reach pfSense (even if no DNS response be given because the device would be blocked). The device is a windows one so we should at least see some DNS requests to *.live.com, *.microsoft.com, & other windows domains. If I didn't know clicking on save does fix the issue, I would have say that this is an ARP or a switch problem.

(An example of what DNS traffic between pfSense & a blocked device should look like. 192.168.101.1 is the pfSense, 192.168.101.100 is a linux server)

Since I don't have a clue on what could be issues...I will give some context and general knowledge on what happens when clicking save (maybe you'll see something that I missed?)

Here is what happens (translated to ipfw commands) when hitting "save" on the page.

$mac = 'aa:bb:cc:dd:ee:ff'; // The impacted MAC 
$pipeno = '24208'; // The pipe number
$cpzone = 'cpzone'; // the name of the captive portal zone

ipfw table {$cpzone}_pipe_mac delete any,{$mac}
ipfw table {$cpzone}_pipe_mac delete {$mac},any
ipfw pipe delete {$pipeno}
ipfw pipe delete {$pipeno+1}
ipfw pipe {$pipeno} config bw 0Kbit/s queue 100 buckets 16
ipfw pipe {pipeno+1} config bw 0Kbit/s queue 100 buckets 16
ipfw table {$cpzone}_pipe_mac add any,{$mac} {$pipeno}
ipfw table {$cpzone}_pipe_mac add {$mac},any {$pipeno+1}

Everything happens in 1 command (the commands are dropped into a file, which is then loaded using ipfw -q /tmp/file_location). That's the reason why the traffic counters are kept despite the rule delete/add.

"pipeno" is a number that you can retrieve by typing ipfw table all list (or alternatively ipfw pipe list).

Ipfw pipes are the "speed limiter" system of the captive portal. Each whitelisted MAC (/allowed user) has two ipfw pipes (one for upload, one for download). Each pipe has a speed, in kbit/s (0kbit/s means unlimited traffic speed).

Some possible explanations on what's happening (which feel all super unlikely/impossible to me...but I don't have better to offer.. )

The ipfw pipes used by the impacted device got deleted or became unavailable (why?). could you carefully check the ipfw pipes next time this issue happens?
I'm seeing that you are having 2 captive portal zones. Maybe the upgrade caused a some sort of conflict between the two zones ?

Gertjan

@free4 said in Captive Portal blocking white listed MAC addresses in 2.5.0:

When clicking save, no rule of the impacted device is being changed. In fact, there is no rule being changed before/after the save.

Still, rules and pipes get destroyed. And recreated.
Because the avaible pipe array is scanned for "free slots" from bottom to top, the same pipe numbers get attributed.

Something make the pipes get stuck ?

@andrewduey said in Captive Portal blocking white listed MAC addresses in 2.5.0:

We could see the MAC address in our switch so we know we have the right MAC address.

Test : change this switch for a another, "dumb one", just to make sure the switch isn't doing something strange.

AndrewDuey

@gertjan
Thanks for keeping with this one. I'm open to any further troubleshooting we can do however changing the switch is not feasible in this environment. 1) The captive portal interface is part of a VLAN trunk, and 2) the specific "captive" device in the captures is physically 10 miles from the firewall connected by 4 managed switches and 2 wireless bridges.

We have seen this behavior with multiple devices in that same captive portal zone at other locations as well (which is one big VLAN/broadcast domain -- not many devices (maybe 50 tops) but does span many buildings). One of the affected wireless devices (an Android based embedded system) was moved from the remote site to the main site so it was just 2 managed switches and one wireless AP away from the firewall and saw the same behavior (removing all the long distance links and bringing it within 20 feet of the firewall ;) )

We have about 25 devices mac whitelisted in this zone and at least 10 devices have had this issue (and they're scattered across different locations). Other devices which are not mac whitelisted have had no issue reaching the captive portal screen and logging in. The issue appears confined to devices which are in the mac white list. It's possible it affected more of the devices but we many of the devices are IoT in nature and so it's not always immediately apparent when they fall offline.

The problem did pop up after upgrading to pfsense 2.5.0 and no other switching or infrastructure changes were made at that time so I'm pretty sure it's not switching or network transport related.

Also, this config has been around for a long time now (this pfsense sent went into service February 2010 and the same config has been steadily migrated (upgraded) to new versions and new hardware (currently using Netgate provided hardware, XG-1537 I believe). It's possible there's some artifacts in the config left over from the past decade of upgrades although I wouldn't know what to look for.

@free4 : You mentioned "The ipfw pipes used by the impacted device got deleted or became unavailable (why?). could you carefully check the ipfw pipes next time this issue happens?" Besides looking at the output of "ipfw table all list" (which last time was identical other than the traffic counters) what would you suggest looking for? Or just grab another copy of the "ipfw table all list" tables before and after saving like last time?

To me, it feels like a FreeBSD bug possibly since the output of "ipfw table all list" looked the same but deleting and re-adding the mac seems to fix it up but I don't know pfsense under the hood nearly as well.

Thanks for continuing to dig into this.

--Andrew

free4

@andrewduey said in Captive Portal blocking white listed MAC addresses in 2.5.0:

what would you suggest looking for? Or just grab another copy of the "ipfw table all list" tables before and after saving like last time?

Yes, could you please execute ipfw table all list and ipfw pipe list before & after clicking on save?
This time, I would like to investigate the pipe aswell.

To me, it feels like a FreeBSD bug possibly since the output of "ipfw table all list" looked the same but deleting and re-adding the mac seems to fix it up but I don't know pfsense under the hood nearly as well.

Thanks for continuing to dig into this.

--Andrew

It could be...but in this case it would be related to some edge-case.

I mean : no one else did report such issue. You are not the only user using multiple captive portal and vLANs altogether. I believe forum would be flooded from complains if the issue was global.
I'm not denying there could be a kernel issue. There is definitely something going wrong there, and we will find out what's wrong

Gertjan

@free4 said in Captive Portal blocking white listed MAC addresses in 2.5.0:

I believe forum would be flooded from complains if the issue was global.

Not only here.
The ipfw firewall, mac's listed in a at table for black or white listing, its just a FreeBSD command. If ipfw, the pipe handling was misbehaving, the FreeBSD support would know about it.
pfSense hides, but executes commands for you, and you could have enter them your self on the command line.

If you can't tear down your network - smart witches, VLAN's, bridges, I understand. But try to bring a device that troubles nearby, have it work 'locally' with just a switch, and see if you can replicate the issue.
Like : packet capturing shwos the device, it's MAC, coming in, and the pipe doing nothing (blocking).
If so, the issue becomes 'FreeBSD'.

AndrewDuey

Well, it's been nearly a month and we have not seen the issue re-appear. At this point I feel like somehow even though the MAC addresses were listed in the table there was something not right and blocking traffic somewhere in BSD/pfsense. When we click to save each entry it seemed to fix them. I suspect that we missed clicking save on a few entries the first time around because I have no explanation why the issue would come back once but not again (I SWEAR we clicked through the list and re-saved all entries, but we all know end users lie :) )

If I see anything further I'll post those details up here, but as of now there's nothing further to troubleshoot.

Thanks again to everyone that took time to review and offer suggestions.

--Andrew

cneep

@andrewduey, for what it's worth...I experienced what appears to be this same problem on two sets of CARP firewalls in two different networks with similar configurations to each other shortly after upgrading all four of them from 2.4.4 to 2.5.x. I have several more very similarly-configured firewalls to upgrade at my various clients, but have stopped doing any additional upgrades for now because it appears upgrading will invariably break an important part of most of my clients' networks.

My temporary work-around has been to disable Captive Portal for now in these two networks, since not having CP running is preferable to having it running but breaking the devices that need to pass the CP. I haven't done as much diagnosing as it appears you've done but am curious to carefully read back through this thread and see if I can discover/add anything useful.

I'm only chiming in to say that you're not alone, since that appears to be in question! I noticed the problem right away too, but only just now had a spare few minutes to hit the forums to see if there was a known solution to the problem.

FWIW.

AndrewDuey

@cneep said in Captive Portal blocking white listed MAC addresses in 2.5.0:

@andrewduey, for what it's worth...

Glad someone else saw it too.

I'm not sure it's enough to go on, but our firewalls are also setup with CARP. I wouldn't think that would have much to do with it since we're not failing over (that I am aware of and I'm pretty sure we're not as we have notifications enabled).

Between this in 2.5.0 and Multi-WAN NAT broken by 2.5.1 we've had a rough go recently with pfsense firmware upgrades :(

AndrewDuey

Still having the same issues nearly a year later. By this time we've confirmed that every time the firewall acts up then we just re-save the MAC address in the captive portal zone. No changes, just clicking the save button again changes it to allow that MAC address even though it's already on the list. We have about 30 devices and we'll need to click edit on each one and save it.

According to the helpdesk staff, they have to do each entry. Doing one does not cause any others to work, only the one you just saved.

The firewall is now running 21.05.2 and still seeing the issue. I was hoping it would be cleared up in a minor release but hasn't yet.

Any other ideas would be appreciated.

michmoor

@andrewduey did you open a redmine tracker for this?

AndrewDuey

@michmoor Nope, I hadn't yet as previous posters seemed to think it wasn't a bug, but an issue with our setup. But I was thinking it's about time since I think we've ruled everything else out.

michmoor

@andrewduey so before i post a redmine issue I do the following

post to netgate forums - usually not the most reliable place to seek help.
post to pfsense reddit forums - usually a more responsive crowd but quality of support varies
open redmine

Hate to say it but option 2 usually works more for me than the official channel here but who knows you may get lucky. I haven't had that issue show up in my portal deployments.

AndrewDuey

@michmoor I'll check in on the r/pfsense reddit and see what I hear. I was hesitant to put it in redmine until after it was confirmed was a bug that other people were seeing and not some sort of a configuration issue (since redmine isn't there for support).

But after hanging it out and having a few others review what I'm seeing, I'm pretty comfortable saying it's a bug - even if there's not a lot of others seeing it.