Latest batch of Patches broke various things like WG, PBR, etc.
-
Hi,
this morning I updated one of our remote offices running 23.01 plus. That site worked with a minor problem I reported to @jimp in another thread (VPN gateway upper/lowercase problem) and after seeing many new patches in the System Patches package, I applied them all and rebooted.
Seems that was a big mistake:
System came up with various problems:
- many packages and services not started (HAproxy, wireguard, tailscale, freeradius etc.)
- after manually starting all that were stopped, wireguard and uPNP Daemon remain stopped
- quick editing of config in uPnP Service brought it up again (just opening, saving again)
- NO VPN traffic worked - the problem in the other ticket (gateway name with upper/lowercase reverted and thus I had to edit ALL rules again)
but after haggling around, still there are issues remaining that simply won't go away:
-
wireguard is still seen as "down" (service down) and I can't stop/start it. Weird enough, the WG tunnel instance reports up and an active peer (the tunnel peer on the other side) and the RAS service also seems to work despite the service status being down and not finding any active wg process on the system. Mobile phone seems to connect and shows up in WG status as connected a few seconds again, alas the service is still down. And as the new setting for wireguard is that you can't disable and reenable the service if you have interfaces assigned, I'm not really happy about deleting two working setups and re-creating them just to stop/start that service?
-
NUT package is HYPERACTIVE and is stopped and started at every. freakin. time! Any time you just touch another package or service and save/restart/edit it or try to restart/reinstall a package it shouts its messages (I use pushover on those systems) and every time you do something, NUT shouts about haveing lost connection to UPS and reinstating UPS connection a split second later. It's really annoying
-
VPN traffic via policy based rules is down and is not seen working anymore. After editing any rule back to the correct gateway it just seems to not pick up traffic at all. It doesn't log, it doesn't activate a state, it's simply ignored. I'll try another reboot but it's tedious,
Also - probably unrelated? - as of this morning I don't have any updates anymore as the system reports (from the console):
... pkg: https://pfsense-plus-pkg00.atx.netgate.com/pfSense_plus-v23_01_amd64-pfSense_plus_v23_01/packagesite.pkg: Authentication error Certificate verification failed for /C=US/ST=Texas/L=Austin/O=Rubicon Communications, LLC (Netgate)/OU=pfSense Plus/CN=pfsense-plus-pkg00.atx.netgate.com 35160031232:error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/crypto/openssl/ssl/statem/statem_clnt.c:1921: Certificate verification failed for /C=US/ST=Texas/L=Austin/O=Rubicon Communications, LLC (Netgate)/OU=pfSense Plus/CN=pfsense-plus-pkg00.atx.netgate.com 35160031232:error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/crypto/openssl/ssl/statem/statem_clnt.c:1921: Certificate verification failed for /C=US/ST=Texas/L=Austin/O=Rubicon Communications, LLC (Netgate)/OU=pfSense Plus/CN=pfsense-plus-pkg00.atx.netgate.com 35160031232:error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/crypto/openssl/ssl/statem/statem_clnt.c:1921: pkg: https://pfsense-plus-pkg00.atx.netgate.com/pfSense_plus-v23_01_amd64-pfSense_plus_v23_01/packagesite.txz: Authentication error Unable to update repository pfSense Error updating repositories!
So perhaps some problems may be related to this, but it's annoying as hell that integrating "small patches" killed the setup like this.
Any help appreciated
Cheers
\jens -
Further looking into it, it seems the patch for the VPN Gateways upper/lowercase problem fixed it being uppercase again, BUT the setting in System/Advanced/Misc about skipping rules when Gateways are down seems to HIT? although the gateway is up, working and the remote end is ping'able without problems. I think the check if the GW is up/down is perhaps still checking the old "wrong" way with the GW Name being lowercase instead of UPPERCASE now and skipping those policy based rules?
Just thinking loud here, as the content of the
rules.debug
seems to indicate exactly this, as all PBRs now are commented out with"rule disables because gateway VPN_NAME_VPNV4 is down"
.
Editing the rules further showed, that the rules aren't even written anymore. Changing the description of the rule should have resulted in a comment with another rulename, but the name stayed the same. So I ranfilter reload
manually after changing the rules and - AHA! - the comments in the rules.debug file vanished! So it seemed that the filter rules weren't even applied after editing and saving and applying in the filter view. That's weird!
The rules for the WG interfaces are still reported down though, even wireguard still think it's down besides getting traffic. That is some weird thing...Reneweing the assigned VPN Interfaces also triggered endless up/downs of the NUT package again when the VPN tunnel was restarted and the interface reassigned. Why does NUT always have to restart/reload when interface actions occur that it has nothing to do with?
-
Seems the PBR problem is part of a bigger one. The Alias, which has a list of entries that should be routed via PBR and is used in the ruleset as destination is miracously empty. I guess that's something to do with another patch but either way it is empty and thus routing doesn't work.
Edit: sigh It's FQDNs. All aliases with FQDNs won't get correctly repopulated and are simply empty thus not working. I thought we already had that taken down...
That seems very much like https://redmine.pfsense.org/issues/9296 again?
But it's weirder as it seems every change currently in aliases, NAT, rules won't get applied at all and I have to go manually to filter-reload every time for it to show up. Just checked, duplicating a FQDN alias and it didn't show up in the tables. Only after manually filter-reloading it was there but empty. What's going on?Trying to reboot again after deleting all wireguard weirdness, hoping to get at least all other functions back.
...
Edit2: OK that third? fourth? reboot seemed to have helped repopulate any alias at boot time correctly including DNS aliases. That way the PBRs are working again, the VPN GWs are again found. Good. Still NUT going nuts about any small change in packages, interfaces etc. But hey. Main things are again working normally...'ish.Funny enough, now that I deleted the Wireguard interface assignments and static routes, after the reboot the service is now in "started" mode again and seems to have started the tunnels (S2S and RAS) without their fixed interface binding again. Will have to see if I recreate the old settings or leave it running for now.
...
Edit3: What the fruck? After the latest reboot it seems the package repository failures have righted itself (or Netgate's team has fixed it's certificate?) Either waypkg update
is running again now. -