Bug reporting rant + RADIUS authentication server incorrectly processing "Accept" messages
-
Ok, I'm pretty irate right now so going to express why before getting into the issue of this post.
I filed a bug yesterday (https://redmine.pfsense.org/issues/10595#change-46445) which clearly points out there is an issue with the RADIUS client authentication mechanism inside of pfSense. I provided the evidence necessary to show it happening and even provided (some) of the troubleshooting steps I went through to rule out other issues. Instead of responding with something like "can you provide us dump output of XXX", or pursuing further effort in information collection, I get a response of "well, works fine here, closed". This is the surest way to encourage me to never pay for pre-built boxes or service contracts (which are still ridiculously priced.) As someone who has been using pfSense since it's original fork from m0n0wall, I cannot express how mad I am by the lack of professionalism shown right now.
---- end rant, begin debug ----
Ok, so to consolidate information, I'm including the details provided in the original bug report here:
The internal RADIUS authentication mechanism is failing to acknowledge received "Accept" messages from a RADIUS server in 2.4.5-RELEASE. As a result all systems relying on the authentication mechanism are rejecting successful authentication requests.
The attached tcpdump screenshot shows the pfSense router (WAN - 192.168.1.30) attempting to authenticate against the RADIUS server (192.168.1.6) via the "Diagnostics/Authentication" tool. We see a successful reply message, but the pfSense box retries for a total of 3x attempts. (For the purpose of network communications flow information, the pfSense box is not being used to manage traffic on the 192.16.1.x network and has a local LAN of 192.168.10.x)
The second screenshot shows the "Diagnostics/Authentication" tool reporting an authentication failure, and the third screenshot shows the associated "system log" entry claiming that no response was received.
In a last ditch effort an "any/any" UDP rule was configured for testing on the WAN interface to determine if the stateless nature of UDP was causing replies to be blocked. No success. Additionally, there were never any firewall log entries reporting traffic being blocked pre/post rule modifications.
-
Environment Details -
VMWare 6.5.0 Update 2
VMXNet3 nics x2 -
Packages -
Open-VM-Tools v10.1.0_2,1
openvpn-client-export v1.4.23
(tcpdump showing us successful request to and response from RADIUS server)
(Diagnostics/Authentication screen reporting failure, contrary to receiving "Accept" packet)
(System general log claiming no response was received) -
-
It works for me, and many others, so we need to figure out why it doesn't work for you. It doesn't matter much if it worked before, you need to focus on what is happening now. There just isn't enough information to say why it's happening, let alone enough information to say it's a bug. Redmine isn't the place to discuss this kind of stuff, the forum is, which is why you were directed here.
The text/images you posted here are the same as on the issue you reported. We need even more information. What you posted is not enough. That low-detail packet capture output doesn't tell us things like "is that a local interface or a VPN?" and "the IP address matches, but what about the MAC address", or things like packet size, interface errors, etc or even "The RADIUS data in that reply was not valid".
You need to post a packet capture with as much detail as possible (but with private info like passwords redacted). Take a capture, download it, load it in Wireshark and analyze it there.
Also check your states table and see if you see entries corresponding to the RADIUS request/reply and also what those look like. You could get this from the CLI with more detail as well, for example:
pfctl -vvss | grep -A2 :1812
Also make sure there aren't any features active which may be getting in the way, such as Captive Portal on that interface, IPsec overlapping that subnet, pfBlockerNG or an IDS like snort/suricata, and so on.
-
I'll run those through when I get a break to touch this again.
As stated in the original ticket, the packet capture is from the WAN interface of the pfsense. It is also a default out-of-the box new install with only the packages listed. The only service configured is OpenVPN, all subnets are happy (no overlap).
The WAN interface would not have seen the response packet it if was an invalid MAC address since it's behind a switch (and not broadcast traffic). For completeness, yes checking the MAC addresses was something done way early on. Additionally, regular traffic flow to/from/through the pfsense box is fine. In fact, I bypassed the RADIUS auth mechanism as a stopgap measure and users are now able to VPN in to use systems located on both the WAN and LAN networks as expected.
Interesting point about checking out the state tables, since UDP is 'stateless' it didn't even dawn on me to look there for data; even more so since it's such a short lived transaction.
Edit: I see I did not clearly spell out this capture was from the WAN interface on the pfsense box in earlier posts. I'm in agreement that it isn't something to be assumed.
-
Bring the packet capture into wireshark so you can see the actual Access-Request and Access-Accept traffic.
-
What is your RADIUS server?
FreeRADIUS or AD?
Any 2FA features (like DIGIPASS)?
can you check it with simple shared secret and userpass (like '123')?