[SOLVED] Corrupted PDF download through new pfSense installation
-
I have run into a very strange issue while testing a new pfSense build for my home network. When I download a file with the pfSense device in the network path, the file is corrupted. So far, I've only experienced it with this one file (that I've noticed), but since it's so reproducible, I'm afraid of becoming the house pariah (more so than usual ;D) if I go live with the new system, and it turns out this is the tip of the iceberg. I've calculated the md5 checksum on the corrupted file, and it is always consistent. I get one value and a viewable file when using my old router, and a different checksum, and a file with missing pages when using the new router. This leads me to believe that there aren't hardware issues or other 'normal' problems that presumably would result in more random corruption of the file, but rather something that is consistently changing the content. The URL of the file I'm downloading is https://dl.ubnt.com/guides/UniFi/UniFi_Controller_V5_UG.pdf
My observations:
-
Download this Ubiquiti manual with pfSense inline (using Chrome or Safari browser) = corrupted file
-
Download bypassing pfSense (using original TP Link router) = intact file
-
Download with pfSense inline (using curl) = intact file
-
Sampling of other file downloads with pfSense inline (using Chrome or Safari) = apparently intact files
I've checked the LAN and WAN interfaces, and they show 0 errors. I've also done a wireshark packet capture, and the Analyze->Expert Information menu item shows nothing remarkable (a couple of out-of-order packets, and a couple of duplicate ACKs, with the good and bad downloads showing roughly the same number). Although this is a HTTPS download, I disabled the transparent Squid proxy to be safe, and I get the same bad md5 checksum with and without Squid active.
I'm really at a loss as to how the combination of pfSense and browser could affect this particular file download, while it works with curl and with the original router regardless of whether I use a browser or curl. I'd write it off as some weird anomaly, except I don't have enough experience with the new hardware and software combination to know if I should have confidence, and also because hell hath no fury like a family with flakey internet.
Any bright ideas on what I should check next, or entertaining theories on what could be going on would be greatly appreciated.
-
-
You do realize that downloading a PDF like that via HTTPS means the firewall cannot see any of the content inside that stream and has no concept of pdf or otherwise and the browser and web server ensure end-to-end encryption and authentication of every bit.
Unless you've done something like HTTPS man-in-the-middle.
Hint: Whatever you're seeing is not pfSense. Unless maybe ^
ETA: Downloads fine here: SHA256(UniFi_Controller_V5_UG.pdf)= 7542671dac5d5f743ae4e56529872cff3b70f4e6557d947537926647785baab7
-
Yeah, that all occurred to me, yet I still turned off squid transparent proxy, even though I knew it shouldn't affect HTTPS traffic, and to be honest, I actually didn't expect moving my ethernet connection from the pfSense box to the original router to make a difference, yet it did.
Anyway mystery (mostly) solved after stepping back and looking at the packet captures more closely. The PDF file is hosted on an Amazon CloudFront content delivery network. It turns out that I was downloading the PDF from different servers depending on which device I was using as a router. Not too surprising in retrospect, since different DNS resolvers could have different answers in their cache. I think what really threw me (apart from sitting at my computer for too many hours straight), was that curl always downloaded the correct content even when I was connected through my new pfSense installation. For whatever reason, curl on OS X was getting the 'good' IP consistently, while the browsers consistently used the 'bad' IP that matched what I would get when using dig against the pfSense resolver.
In any case, my confidence is restored in my new installation, and I guess I'm just going to have to live with curl vs. browser DNS resolution mystery.