Is this a bug? Hostname Underscore

NOYB

Edit Static Mapping - Hostname (services_dhcp_edit.php)

is_hostname function (util.inc) calls is_domain function (util.inc) which correctly permits underscores in domain name labels. However, are underscores valid in hostnames? Didn't think they were.

johnpoz

See https://tools.ietf.org/html/rfc2181

It pretty much removes many of the character restrictions on dns labels, so your free to use _ if you want and pretty much anything else goes as well.

These are the rules I do believe for host names

A host name (label) can start or end with a letter or a number
A host name (label) MUST NOT start or end with a '-' (dash)
A host name (label) MUST NOT consist of all numeric values
A host name (label) can be up to 63 characters

And spaces are not allowed.. But from the new rfc yeah you can use underscores "_" Just don't be surprised if some clients or applications haven't gotten the memo that its ok ;)

dennypage

DNS does not equal host name.

RFC 2181 clarifies the restrictions (or lack thereof) inherent in DNS itself. It does not read to higher level restrictions on the data housed in the DNS database. The host name restrictions specified in 952 and 1123 are still pertinent.

jimp

We go back and forth about this every couple years. By all indications, they work but may not be considered "valid". Is there something specific that you know of that will not accept an underscore? I think much of the confusion came about when Windows allowed it first in some sort of "Embrace and Extend" push they did back in the day with hostname resolution (like NetBIOS) which allowed underscores.

If a host won't accept a name with an underscore, then it's a limitation of its OS (probably rightly enforcing the RFCs), but with RFC 2181 a hostname in DNS can have underscore so just because some other host can't accept "_" in its own hostname, it still has to understand that others hosts may use an underscore.

Static mappings are a bit of a grey area. The client OS doesn't have to accept a hostname with "_", it can keep using the hostname it wants. Meanwhile, that name can be used as valid in DNS if you register static mappings with the forwarder/resolver.

So ultimately the question of whether or not it's valid is irrelevant on that screen since even if it was not valid for a hostname, it is for DNS, which is a common use case for that field.

The only place where we might want to reject "" in hostnames is in System > General, but FreeBSD allows "" in hostnames, so there is no incentive to reject it in our GUI except to be pedantic wrt the RFC. And then someone who uses "_" in their firewall hostnames and has been working fine for years will complain that it broke.

tl;dr: Let it slide.

dennypage

The problems aren't as bad as they used to be, but many higher level systems (java, python, etc.) still have validators that fail when underscores are present in host names. This can result in some very obscure problems.

jimp

Maybe so, but that wouldn't have anything to do with the firewall or what it can/should accept.

From the perspective of a separate high level system, they should be valid as they would be dealing with hostnames as seen by DNS, unless I'm not understanding the context. Even for an IPAM system, it shouldn't be validating that way, unless it's optional to conform to a chosen site/org standard.

If a program is told to connect to a host using DNS and the DNS name contains and underscore and it fails, that's definitely a bug in the program at this point in time (Thanks to RFC 2181).

dennypage

@jimp:

If a program is told to connect to a host using DNS and the DNS name contains and underscore and it fails, that's definitely a bug in the program at this point in time (Thanks to RFC 2181).

Nope, not a bug.

To be very clear, RFC 2181 does not remove the restriction on underscore in host names.

RFC 2181 reads to DNS itself, not to the data carried in DNS. DNS is a database. Host names are data in the database. People tend to equate DNS and host names because the vast majority of data carried in DNS is host name related. But they are not the same. And their restrictions are not the same. This is noted in 2181.

The wikipedia article on host name restrictions has a good description that calls out the difference:

https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

jimp

I'm aware of the distinction, but given how loose most people seem to be with validation… Again: Why bother?

Even Namecheap's DNS front end allows me to enter them for an A record, and I can resolve them fine all the way through to my workstation.

The classical reason it was rejected "It was shift -, and DNS is not case sensitive" doesn't make much sense. It works in practice, and people are actively using it, so why should we reject it, except to strictly follow the RFC when practically nobody else is? We'd lose functionality, get complaints, etc. It's a losing proposition.

dennypage

I am not saying that pfSense needs to expressly prohibit it. I am saying that it cannot be claimed to be a bug in other's code if problems arise from its use.

jimp

If they want to strictly point at the RFC and claim that, sure, but the fact is, the cat's out of the bag. People are already doing it, and to not accept that and allow it at this point is probably worth calling a bug. Consumers of the program will eventually demand they "fix" it, and depending on the program, resisting the change is most likely not in their best financial or general interest.

jimp

To clarify some points:

1. By the letter of the RFC sections that have not been replaced or superseded, underscore is not allowed in hostnames
2. Despite #1, DNS servers, clients, host operating systems, etc (including pfSense), have been allowing underscore in hostnames. Not in line with the original RFC, but with what is in actual use and reflecting reality. Most likely because underscores are now allowed in domain names and service record names, and people are generally lazy and don't want to overdo the validation code.
3. It works fine in many cases, and is in active use
4. Therefore, anyone in this day and age sticking to the letter of the RFC is being pedantic, and likely doing a disservice to their users. I'd call that a bug, others may not. Agree to disagree.

IDN/IDNA, Unicode, and other standards most likely make it all even more confusing and hard to validate. The standards track documents like RFC 5890 call each section of the FQDN a "label" and treat them equally, while also referencing RFC 953 at times. It's as if nobody wants to come out and say it, but they know nobody is actually paying attention to the old spec now.

And this from RFC 2181:

The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators). The zero
length full name is defined as representing the root of the DNS tree,
and is typically written and displayed as ".". Those restrictions
aside, any binary string whatever can be used as the label of any
resource record. Similarly, any binary string can serve as the value
of any record that includes a domain name as some or all of its value
(SOA, NS, MX, PTR, CNAME, and any others that may be added).
Implementations of the DNS protocols must not place any restrictions
on the labels that can be used. In particular, DNS servers must not
refuse to serve a zone because it contains labels that might not be
acceptable to some DNS client programs. A DNS server may be
configurable to issue warnings when loading, or even to refuse to
load, a primary zone containing labels that might be considered
questionable, however this should not happen by default.

Using that interpretation, the hostname is just another label part of the record name, and the only restriction is length. And pfSense must not refuse to allow a record just because some client might not like it.

So if it works on your application or operating system, have at it. We won't get in the way. If you have something that refuses to let it work, you can accommodate it manually by not using underscores.

NOYB

If it violates RFC then it's a bug. The fact it is being widely used and accommodated does not change that. The devices, code, people, etc. that are using it and accommodating it are in error.

This is how defacto standards end up coming about and creating incompatibilities.

On the page referenced in the OP the "hostname" portion only is explicitly being requested. Not a domain name, FQDN, label, or DNS record.

Now that a defacto standard has permeated the environment it becomes to a mater of practicality whether or not to enforce the industry standard. From the practicality perspective I personally don't really have a dog in the show one way or the other.

I do think though that there needs to be a clear, agreed upon industry standard and that it should either be followed or changed. Not left up to peoples whims about what they want to do.

If those not following the standard are negatively impacted by the standard being enforce then that is on them and if they don't like it then they should put in the work to get the standard changed to suit them.

Not enforcing the standard also comes with impacting those who do. Yes because pfSense isn't enforcing the standard and accepted an invalid hostname containing underscores from a DHCP client, I had to troubleshoot and accessibility problem with an application that correctly enforces the standard.

jimp

How I read the section I quoted, in context with the other RFCs is:

1. If you are a host, you probably shouldn't allow _ in your own hostname – This is all you can realistically control
2. If you are a DNS client, you should resolve whatever you're asked to resolve
3. If you are a DNS server, you must allow anything anywhere in both a record name and data in any part (within the stated length restrictions)

But it's not clearly stated, and having to correlate a bunch of RFCs (including some not formally accepted like the standards track IDN/IDNA ones), it's very confusing for everyone.

Which is why I fall back to: There is no benefit to being strict at this point in time, and if we were to be strict after being lenient, it would only generate complaints and wouldn't accomplish anything meaningful.

dennypage

There is a #4, which is all the infrastructure and applications that use host names. This is by far the biggest source of issues.

I agree that #2 and #3 are clear. DNS server and DNS resolver implementations are supposed to be blind to the meaning of the data being resolved and not impose any restrictions beyond RFC 2181. However, it is up to the higher layers to determine if the data returned is considered valid in context. In simple command line terms, dig is data blind and must return the result of the query without interpretation, however that doesn't mean that an application like ssh must honor the result as a valid host name.

Although DNS servers are supposed to be data blind, they aren't always. The most commonly used name server is still bind. One of the reasons that it isn't liked by some is it has a lot of features that extend beyond the basic function of answering queries. Host name validation is one of these features. Bind will actually inspect A and AAAA records for validity, warning and optionally ignoring host names containing underscores. Some would consider this a convenient feature, others might consider this a violation of RFC 2181. :)

dennypage

Here is an interesting piece of history:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=176093

FreeBSD responding to Windows violating the RFCs.

What makes it particularly entertaining is that while the Windows operating system was actively violating the RFCs, Internet Explorer was actively enforcing them. I haven't used Windows in years so I can't confirm, but it appears that they still are enforcing the RFCs in IE:

https://connect.microsoft.com/IE/feedback/details/853796/internet-explorer-wont-save-cookies-on-domains-with-underscores-in-them

dennypage

@NOYB:

Yes because pfSense isn't enforcing the standard and accepted an invalid hostname containing underscores from a DHCP client, I had to troubleshoot and accessibility problem with an application that correctly enforces the standard.

Btw, what was the application? Is it Java based?

NOYB

@dennypage:

@NOYB:

Yes because pfSense isn't enforcing the standard and accepted an invalid hostname containing underscores from a DHCP client, I had to troubleshoot and accessibility problem with an application that correctly enforces the standard.

Btw, what was the application? Is it Java based?

You already know what the application was. You mentioned it and posted a link to the "non-bug" in your previous post.

https://connect.microsoft.com/IE/feedback/details/853796/internet-explorer-wont-save-cookies-on-domains-with-underscores-in-them

So consider it verified that IE is still enforcing the RFC. Like so many others should be.

In my case it was an HP printer that was issuing invalid hostname containing underscores. IE would open the printer's built-in web page but the page would not work correctly because IE wasn't saving the cookies. Had to workaround it by accessing with IP address instead, until I figured out what the issue was. Would have been much more obvious if pfSense had refused to register the invalid hostname provide by the client. Fortunately the latest printer firmware doesn't allow or use underscore in the hostname. In my opinion neither should pfSense accept underscore in hostnames. They are not valid. Just because people mistakenly/incorrectly/ill-advisedly use underscores in hostnames does not make them valid. i.e. per spec.

If people want to operate outside of spec then they should be ready and willing to bare the burden when the spec is enforced.