Netgate 3100 URL unknown
-
@stephenw10 : Hi Steve
my TAC ticket vanished (none is shown about that case) somehow and I never received further information about the case after I included the asked questions,What happened to the case? TicketID was (afair) 1776782444
-
Huh, I'm sorry about that. Let me check....
But to summarize the issue appears to be that pkg is using the local Squid proxy correctly but that is sending the srv request incorrectly as a normal https call to the Bluecoat proxy which rejects it.
And this didn't happen before 23.0X.
SRV calls have been used by pkg for years so it seems like this is a regression in Squid. However the entire pkg system was changed in 23.01.
Is there any test you can do to rule out one of those components? Perhaps manually login to the upstream proxy and try to pull pkgs without Squid?
Steve
-
@stephenw10 said in Netgate 3100 URL unknown:
Huh, I'm sorry about that. Let me check....
But to summarize the issue appears to be that pkg is using the local Squid proxy correctly but that is sending the srv request incorrectly as a normal https call to the Bluecoat proxy which rejects it.
And this didn't happen before 23.0X.
SRV calls have been used by pkg for years so it seems like this is a regression in Squid. However the entire pkg system was changed in 23.01.
Is there any test you can do to rule out one of those components? Perhaps manually login to the upstream proxy and try to pull pkgs without Squid?
Steve
Hi Steve
that's exactly what I supposed as well. It might be the newer squid... but I rarely had any update issues with squid (which I use in many instances since ~2000 in a few very strange ways/configs)If you tell me any curl/wget check I should check I can fill the rest to ask the proxy directly.
Good to know that SRV is not an issue as our proxy team did not answer to my questions.
As the V22.x and V21.x work (still) on the same proxy (=same machine), but no V22.05.1 and above it's surely a small thing which is done differently. Same happened in V21 to V22. when the pkg asked for the two google DNS servers at the script start, else it stopped working (so we had to build a workaround for that DNS search)=>works plain in INet connected situations with open DNS but not in restricted areas.
Cheers
Michael -
Do you actually mean 22.05.1? That was never released as an upgrade only as a new install for 8200 and some 2100s.
However that was the first release that had the dynamic repo systems and still used the same packages (including Squid) as 22.05. So that seems like a big clue if so.The most likely thing here is that the new dynamic repo pkg system is not respecting the system proxy configuration. However I'm unsure what you would see if that's the case. Would the upstream proxy see those requests at all?
When we were investigating this we did find a bug that might do that but it looked like in your setup that would result in the upstream proxy simply not seeing anything.
-
@stephenw10 yes I mean since 22.05.1 as our new 2100 modells have the same error with this config, while the older SG3100 with 22.05 and below work as expected.
So I guessed it's the same problem. Updated (V23.x) SG3100 when downgraded (reinstalled from stick) to V22.05 (the latest that really works) works with the same config.
We have about 60 SG3100, 10 SG2220 and now 20 SG2100, where the SG2220 works until V22.05; same goes for the 3100 as long as under V23.01 and all the SG2100 (only four used/tested and "reshelved until solved") experience the same "no packages/plugins, no update anymore".
The local squid sees the questions/fetches for the files but the steps are different. I included the squid logs from both (successful and unsuccessful) tasks when triggered in the TAC ticket. Someone even analyzed the tcpdump log from all traffic, but apart from DNS lookup failures there was nothing mentioned.The last entry I provided was the constellation:
Inet <->Bluecoat/Proxy LAN<-> pfsense WAN without INet DNS (just local addresses resolvable) <->LAN1-4 = our testing enviromentso basically the pfsense is used as a LAN - 2- LAN Router+NAT with only a proxy to fetch "usual" (only port 80+443 is allowed) http+https stuff.
If you use two stacked pfsenses while the stacked one has no DNS and only proxy uplink it should be reproducable.
I've a very limited possibility (Enterprise IT...) into what the bluecoat/Plink Proxy is doing/seeing. My best guess are the two different squid fetches.
Have you found out what happend to the ticket?
Cheers
Michael -
@stephenw10 said in Netgate 3100 URL unknown:
Is there any test you can do to rule out one of those components? Perhaps manually login to the upstream proxy and try to pull pkgs without Squid?
If you sent me any URL to call I'm glad to test and post the squid and or (uplink) proxy reply.
-
@stephenw10 : Hi Steve; did you find the original call?
-
Hey, yes I was reviewing the ticket and there was definitely some confusion as to what the setup is. It's unusual!
So as I said we did find a bug that affects the configured proxy for pg fetches. However it looked like it couldn't be applying here because Squid is still seeing the requests.
Do you know what the expected route would be for traffic that isn't set explicitly via the proxy? Would it just fail?
I've attached here an experimental patch for the issue we did find. If you're able to test that it will at least rule that out as the cause. I've tested it here in 23.05.1 on a 3100.
Steve
-
@stephenw10 : Hi
I'll apply the patch and report back asap.And yes all non-proxy calls will be denied (generally no Inet access in the local net) or revoked (e.g. NX_DOMAIN) or timed out. The only one seeing the internet is the proxy.
P.S.: I still don't see the ticket in my tac portal.
-
@stephenw10 said in Netgate 3100 URL unknown:
I've attached here an experimental patch for the issue we did find. If you're able to test that it will at least rule that out as the cause. I've tested it here in 23.05.1 on a 3100.
P.S.: How should I apply the patch? Does not seem to be a shell script...
-
Yup, your ticket was moved to a different queue that should have been visible to you but for some reason isn't. We opened an internal bug for that and moved it back so you should see it again now.
You can apply that patch using the system patches package: https://docs.netgate.com/pfsense/en/latest/development/system-patches.html
However now I'm thinking that you probably can't install that package on an affected device! So you may have to use the patch command directly:
[23.05.1-RELEASE][admin@fw1.stevew.lan]/root: fetch https://forum.netgate.com/assets/uploads/files/1693952859372-local_proxy.patch 1693952859372-local_proxy.patch 1378 B 1613 kBps 00s [23.05.1-RELEASE][admin@fw1.stevew.lan]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc |index a31dd38748..8decf26f3c 100644 |--- a/src/etc/inc/pkg-utils.inc |+++ b/src/etc/inc/pkg-utils.inc -------------------------- Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 succeeded at 1508 (offset 1 line). done
-
@stephenw10
Done exactly the shell way:[23.05-RELEASE][root@]/root: fetch https://forum.netgate.com/assets/uploads/files/1693952859372-local_proxy.patch
1693952859372-local_proxy.patch 1378 B 4592 kBps 00s[23.05-RELEASE][root@]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:|diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc
|index a31dd38748..8decf26f3c 100644
|--- a/src/etc/inc/pkg-utils.inc+++ b/src/etc/inc/pkg-utils.inc Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 succeeded at 1508 (offset 1 line). done Now I rebooted and will check;
I tried the same on a 22.05.1:
[22.05.1-RELEASE][root@]/root: patch -d / -t -l -p 2 -i /root/1693952859372-local_proxy.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:|diff --git a/src/etc/inc/pkg-utils.inc b/src/etc/inc/pkg-utils.inc
|index a31dd38748..8decf26f3c 100644
|--- a/src/etc/inc/pkg-utils.inc+++ b/src/etc/inc/pkg-utils.inc Patching file etc/inc/pkg-utils.inc using Plan A... Hunk #1 failed at 1507. 1 out of 1 hunks failed--saving rejects to etc/inc/pkg-utils.inc.rej done [22.05.1-RELEASE][root]/root: I'll check the repo out from this machine and will report back. As both machines are running as servers (proxy) I'd only check/reboot late at night.
P.S.: Now I see the ticket again. Should we continue there instead of here?
-
@stephenw10
as expected it did not change the unavailability for packages nor OS updates (see pic)
While the patch download worked (*.netgate.com is allowed), the "error 500 to ews" does not seem right and the 503 maybe a followup error.
The the ticket one of your colleagues analyzed the tcpdump and asked why the (local) DNS is replying with NXDOMAIN to the DNS call for the pfsense-plus SRV address. Unluckily there was no further info after I explained the "first DNS is purely local, only proxy has further infos".The tcpdump maybe more interesting esp. because it shows both working and no more working downloads from two clients (V22.x and V23.x).
-
Ok, yes let's move this back to the ticket. We can update this when the problem/solution is identified.
-
@stephenw10 : Hi Steve
do you have any status report for me? My CTO already asked me why I still stall the deployment.
I put all info into my TAC ticket, but no reply so far from you there?! -
Hmm, let me check. The 23.09 release testing has taken over all my spare cycles right now.
-
@stephenw10 :
Hi Steve
any update or still locked into 23.09 testing?
If it helps I'd draw a networking diagram how the data flow works and how you'd reproduce it (quite simple actually).
Cheers
Michael -
Sorry 23.09 is using all my time. With any luck that should be imminent though and then I can look at it. And you can go straight to 23.09 at that point.
I'll try to get the setup replicated before that.
-
@stephenw10 Hi Steve, do you have any knowledge when the tech team is able to dig into the case? We are stuck since May which is getting harder to cope with with every week going by.
-
Yes, I'm digging into this again today.