NTP and Manual Outbound NAT Issues

planedrop

Alrighty, so I am trying to figure something out that's been puzzling me for a while, there are a few threads (which I will reference below) from years back that have a fix for this issue, however the reason why the fix works doesn't really make sense to me.

I have manual outbound NAT setup instead of automatic (not going to get into why I have it without auto), was getting my NTP server setup, and it kept saying NTP servers weren't reachable, however they 100% were from multiple VLANs and I could run ntpdate -q and query all the configured ones I have just fine.

Now, I added a manual outbound NAT rule for "This Firewall (self)" which fixed it, OR enabling NTP to listen on WAN fixed it as well, just like people in these below two posts:

https://forum.netgate.com/topic/127340/solved-ntpd-on-vlan-sub-interface/4

This one claims that pfSense should be selecting the lowest interface (if WAN isn't being listened on), however I do have NAT configured on my lowest interface and on 127.0.0.0/8, but like this user, it still wasn't working.

https://forum.netgate.com/topic/131506/ntp-not-working-solved-totally/24

In this post, it's finally resolved by @jimp who suggested setting up the outbound NAT rule for "This Firewall (self)" which resolved the issue.

What I'm trying to figure out here though is WHY does this work? I feel like as long as 127.0.0.0/8 and all of my VLANs/subnets are setup with proper outbound NAT (which they are), then this shouldn't be required. What IP is the firewall binding to/using for NTP requests other than an interface address? And why in the world does selecting WAN on the NTP listening settings also fix the issue?

This portion doesn't really make sense to me and I'm hoping someone from these 2 threads or just in general can help clarify this with me. I REALLY like I know how things work in depth, which is something I love about pfSense, everything is very transparent to the user without much automation that is "invisible", but this seems to be a little odd. Additionally, do people who have automatic outbound NAT not have this issue? If so does that meant automatic is creating, effectively, a "this firewall (self)" rule but isn't creating that rule when you swap to manual?

Tagging a few people from these threads here:
@johnpoz
@beremonavabi
@ZsZs

jimp

It's easy to answer your question yourself based on your own data.

Look at /tmp/rules.debug with and without the extra manual outbound NAT rule. See what the difference is.

Look at the contents of the state table and see what NTP traffic is there with/without the extra NAT rule in place.

Look at the NTP config and see what it's allowed to bind on. Look at sockstat | grep ntp and see what it did actually bind to.

Picking the WAN means it could bind to WAN and wouldn't need NAT to get out.

If it can't use WAN, it would pick another address "closest" to the egress point since it is UDP, but it's hard to say what that might be without more info about your config.

tl;dr it needs it because your config won't let it get out otherwise, but there isn't enough info here to say why specifically.

planedrop

@jimp Yeah I will do some more digging on this for sure, I'm very curious myself.

It's just odd to me because everything I've seen shows that it binds to the lowest interface, but even if it's not that, literally every interface I have has an outbound NAT rule that allows this, so it seems weird to have to explicitly include "this firewall (self)".

I'll check out sockstat and see what I can find.

Your response about the WAN makes total sense though, so that concludes that portion.

jimp

If you look at the state table data right now and look at what NAT translation it's doing for traffic going to your NTP servers that would answer a lot of your questions right there. The local part would show what NTP is using as the source.

planedrop

@jimp Hey thanks a ton for the help here, I guess I could have dug on this on my own more, just been a tiring few weeks lol.

So yeah it's binding to an IP that doesn't have outbound NAT, which is interesting to me because the only place I can find that IP defined is pfBlocker's DNSBL Virtual IP Address.

Now I have that IP bound to an interface that has the lowest subnet so that probably is the reason why (pfB creates a virtual IP and binds that to the interface marked as Web Server Interface).

I think I'm getting somewhere, thanks a ton!

planedrop

@jimp So I finally did a sockstat and got some interesting results, below is a list of IPs it is (or can?) be bound to (excluded ipv6 since I don't use it in this setup):

127.0.0.1
10.10.10.1
10.15.10.1 (which is a VIP on the interface of the above subnet)

Any reason you can think of as to why it's using 10.150.10.1 instead of one of the lower addresses like the localhost ipv4?

I get why 10.150.10.1 might be used, since it's a virtual IP for pfBlocker on my lowest interface, but I can't get WHY it actually picks that, seems to me it should grab the subnet with the lowest number, which would be 127.0.0.1 or if it's picking the lowest interface, then shouldn't it pick the lowest IP that is associated with that interface, which would be 10.10.10.1 in this case? (that is the native interface that the pfB 10.150 VIP binds to).

Anyway, thanks for all the help here, at least I know what it's actually doing now, might dig into the code for it and see if I can figure out the why portion though.

jimp

It's hard to say exactly why it picked that but usually it's because the OS selected it as "closer" to the destination in some way.

planedrop

@jimp Interesting, yeah I dug through the PHP for it a bit but I'm no expert when it comes to coding so couldn't find a reference to why it would have picked that. Just seems odd to me since it's a higher IP address than the VLAN it's bound to, and it's a VIP, so doesn't really make sense for it to be picked.

I'll see if I can find more info on it purely because I'm curious.