Netgate 3100 URL unknown
-
@stephenw10 : Hi Steve; did you find the original call?
-
Hey, yes I was reviewing the ticket and there was definitely some confusion as to what the setup is. It's unusual!
So as I said we did find a bug that affects the configured proxy for pg fetches. However it looked like it couldn't be applying here because Squid is still seeing the requests.
Do you know what the expected route would be for traffic that isn't set explicitly via the proxy? Would it just fail?
I've attached here an experimental patch for the issue we did find. If you're able to test that it will at least rule that out as the cause. I've tested it here in 23.05.1 on a 3100.
Steve
-
@stephenw10 : Hi
I'll apply the patch and report back asap.And yes all non-proxy calls will be denied (generally no Inet access in the local net) or revoked (e.g. NX_DOMAIN) or timed out. The only one seeing the internet is the proxy.
P.S.: I still don't see the ticket in my tac portal.
-
@stephenw10 said in Netgate 3100 URL unknown:
I've attached here an experimental patch for the issue we did find. If you're able to test that it will at least rule that out as the cause. I've tested it here in 23.05.1 on a 3100.
P.S.: How should I apply the patch? Does not seem to be a shell script...
-
Yup, your ticket was moved to a different queue that should have been visible to you but for some reason isn't. We opened an internal bug for that and moved it back so you should see it again now.
You can apply that patch using the system patches package: https://docs.netgate.com/pfsense/en/latest/development/system-patches.html
However now I'm thinking that you probably can't install that package on an affected device! So you may have to use the patch command directly:
[23.05.1-RELEASE][admin@fw1.stevew.lan]/root: fetch https://forum.netgate.com/assets/uploads/files/1693952859372-local_proxy.patch 1693952859372-local_proxy.patch 1378 B 1613 kBps 00s [23.05.1-RELEASE][admin@fw1.stevew.lan]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc |index a31dd38748..8decf26f3c 100644 |--- a/src/etc/inc/pkg-utils.inc |+++ b/src/etc/inc/pkg-utils.inc -------------------------- Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 succeeded at 1508 (offset 1 line). done
-
@stephenw10
Done exactly the shell way:[23.05-RELEASE][root@]/root: fetch https://forum.netgate.com/assets/uploads/files/1693952859372-local_proxy.patch
1693952859372-local_proxy.patch 1378 B 4592 kBps 00s[23.05-RELEASE][root@]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:|diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc
|index a31dd38748..8decf26f3c 100644
|--- a/src/etc/inc/pkg-utils.inc+++ b/src/etc/inc/pkg-utils.inc Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 succeeded at 1508 (offset 1 line). done Now I rebooted and will check;
I tried the same on a 22.05.1:
[22.05.1-RELEASE][root@]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:|diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc
|index a31dd38748..8decf26f3c 100644
|--- a/src/etc/inc/pkg-utils.inc+++ b/src/etc/inc/pkg-utils.inc Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 failed at 1507. 1 out of 1 hunks failed--saving rejects to etc/inc/pkg-utils.inc.rej done [22.05.1-RELEASE][root]/root: I'll check the repo out from this machine and will report back. As both machines are running as servers (proxy) I'd only check/reboot late at night.
P.S.: Now I see the ticket again. Should we continue there instead of here?
-
@stephenw10
as expected it did not change the unavailability for packages nor OS updates (see pic)
While the patch download worked (*.netgate.com is allowed), the "error 500 to ews" does not seem right and the 503 maybe a followup error.
The the ticket one of your colleagues analyzed the tcpdump and asked why the (local) DNS is replying with NXDOMAIN to the DNS call for the pfsense-plus SRV address. Unluckily there was no further info after I explained the "first DNS is purely local, only proxy has further infos".The tcpdump maybe more interesting esp. because it shows both working and no more working downloads from two clients (V22.x and V23.x).
-
Ok, yes let's move this back to the ticket. We can update this when the problem/solution is identified.
-
@stephenw10 : Hi Steve
do you have any status report for me? My CTO already asked me why I still stall the deployment.
I put all info into my TAC ticket, but no reply so far from you there?! -
Hmm, let me check. The 23.09 release testing has taken over all my spare cycles right now.
-
@stephenw10 :
Hi Steve
any update or still locked into 23.09 testing?
If it helps I'd draw a networking diagram how the data flow works and how you'd reproduce it (quite simple actually).
Cheers
Michael -
Sorry 23.09 is using all my time. With any luck that should be imminent though and then I can look at it. And you can go straight to 23.09 at that point.
I'll try to get the setup replicated before that.
-
@stephenw10 Hi Steve, do you have any knowledge when the tech team is able to dig into the case? We are stuck since May which is getting harder to cope with with every week going by.
-
Yes, I'm digging into this again today.
-
@stephenw10 said in Netgate 3100 URL unknown:
Yes, I'm digging into this again today.
Hi; I'm looking forward to :-)
-
Ok I have this working with chained Squid instances.
Can you remind me what the special authentication requirement was that meant Squid was needed? It works ine with basic authentication but if I can test something closer to whatever the Bluecoat needs hat would be better.
-
Oh I see your reply on the ticket! Testing LDAP....
-
@stephenw10 : Hi Steve
I've included the squdi conf in the TAC ticket. The login is just unusual for the LDPA style which is about 80 characters long, but not so much special.
When you chained squids are working: does the lowest box have an internet DNS resolver (or some way to resolve inet DNSes) or (like we) only local addresses? My first ticket about this case was about the SRV call and the 503 reply from our uplink proxy.
Afair I did a picture (TAC ticket) how the boxes are connected and what they "see". -
@stephenw10 said in Netgate 3100 URL unknown:
Oh I see your reply on the ticket! Testing LDAP....
not sure if digiging into the LDAP helps at all: the problem only occurs on modern pfsense versions but not on older so the basic call must have changed, not the squid config.
In the tcp_dump we at first did one of your techs told us why we (our local DNS) answers the DNS call with NX_Domain so the initiation is running wrong. It was never answered or called after so I'm not sure if the failed DNS lookup is then forwarded to the proxy and answered there so no need to worry there.
-
Yeah I agree using LDAP auth instead of local seems unlikely.
My test box does currently have local DNS resolution. I'll try removing that.
I don't think this is an SRV issue though since the pkg servers have been using that for years. And that you still saw this issue in 22.05.1 which would have been using the same known working Squid version but with the new pkg system.
By far the most significant thing that changed is that the pkg servers used by the dynamic repo system require client certs to access them. To do that through a proxy obviously relies on the proxy correctly passing the cert from the client to the upstream server.
More recently the pkg binary switched from fetch to curl which added at least one known bug but should not be an issue when using local Squid.
Steve