Logging URLs

Lockie

Hi,

Alittle while ago I posted this post:
https://forum.netgate.com/topic/168568/logging-urls

However, since it's had zero replies I thought I'd ask again in another section.

I'd like to learn but also execute a mthod of logging/recording and viewing the URLs which are visited within my network. What would be even better is to be able to view entire URLs, not simply the high level domain (or whatever the correct term is).

I'm aware that one way of doing this is Squid and LightSquid. However, this method seems rather complex and from what I've read has a number of drawbacks with maintainence and certificates.

Given how powerful pfSense is, I'm surprised there isn't a simpiler more focused method of gaining URLs or even a package dedicated to doing so, but ofcourse I don't full understand networking, so my assumptions are ignorant.

Back to the question, is there a simplier method than Squid of logging and viewing URLs? Is there a new package which might not be well known that makes it easier? etc. Please help.

Many Thanks

johnpoz

@lockie said in Logging URLs:

What would be even better is to be able to view entire URLs, not simply the high level domain

And how would you accomplish this in a HTTPS connection? While traffic in the clear HTTP might be possible to sniff the traffic and log all gets and some magic on keeping track of the full parent path since once a connection is made the actual get wouldn't have to include the full url, but could just be the relative path, etc.

With https - the fqdn could be seen in the handshake (unless using esni (encrypted sni)) or the new replacement ech (encrypted client hello) is being used, so could be possible to sniff that info. But once the secure connection is completed any further info on the url like www.domain.tld/whatever/something.html etc. would be inside the encrypted connection and pfsense would have no way of seeing that unless you were doing full MITM (man in the middle) interception of all the traffic.

The only way to log urls is to do so at a proxy. And for https you would have to be doing full mitm. Which isn't easy, and without full control of the browser or device having it trust your mitm certs you create. Just plain impossible.

Without something like a proxy, the best that can be done really is logging of dns queries. This will get you the top level of where a client is wanting to go, say www.somedomain.tld but any details below what would not be in the dns query. And just because a dns query is made doesn't always mean the whatever actually went there, it could just be a test of dns to validate that - and might not actually make a connection to there.

If you want to track urls a client goes to, really need to do that at the client with some software. Since the client is really the only place you would have access to see where the client is wanting to go, even with https traffic.

So while sure it might be possible to work out something for your http traffic, with the vast majority of all internet traffic these days being https. The info you could glean would be sporadic and incomplete at best.

dma_pf

@lockie said in Logging URLs:

Given how powerful pfSense is, I'm surprised there isn't a simpiler more focused method of gaining URLs or even a package dedicated to doing so, but ofcourse I don't full understand networking, so my assumptions are ignorant.

I think it would help if you read up a bit on how routing on the internet works. Nothing that is routed out on the internet is actually traveling across the internet looking for a domain name. So when you type www.google.com in your browser the client is not sending out packets to the internet looking for www.google.com. There is no mechanism for traffic to get routed in that manner on the internet and pfsense has no way of sending out any packets in that manner to log it.

What happens is that when your browser wants to send you to www.google.com it first has to get the actual numerical address (ip address) out in the internet where google's server are located. So the first thing that happens is a request goes to a DNS server to get the ip address to google's server. This can be done directly through pfsense (via unbound), or any other DNS server configured in pfsense (via forwarder) or directly from the client via the client's configured DNS settings which completely bypasses pfsense.

Once the IP address is retrieved pfSense can now create a packet that is sent out. If you look at your firewall logs you can see all of the places that pfSense has sent traffic to (assuming you have a rule to log all outbound traffic). If you look at the firewall logs you will see that you never see a FQDN in the logs because the FQDN is never known to pfSense in packets it sent out. All it knows is the IP address. If you click the ! by the destination IP address in the firewall logs pfsense will go out to the internet and do a reverse DNS lookup to find the FQDN for that particular IP address. PfSense has no built in functionality to do automated reverse DNS lookups for traffic on an interface.

johnpoz

@dma_pf said in Logging URLs:

PfSense has no built in functionality to do automated reverse DNS lookups for traffic on an interface.

Even if you do the reverse - that is rarely going to tell you the fqdn used to access that IP.. And for sure not the full url.

Even in the days before CDN, a site hosted on specific server most always hosted multiple sites via 1 IP.. and the reverse of this IP might be something like serverXYZ.hostingdomain.tld

This PTR for that IP might tell you the name of the server the site is hosted on, it would not tell you that you went to www.funstuff.com ;) and that server might host loads of other stuff like not.funatall.net etc..

But yeah your correct the only thing the firewall/router part of pfsense would know is the IPs and ports involved in the conversation that it either allowed or blocked. Now the dns part of pfsense would know the fqdn you asked for to find that IP.. But again it wouldn't have a clue to the actual full url being requested www.funstuff.com/whatIwanttosee/index.php etc..