[Solved] 2.3.4 Unable to retrieve package information on secondary pfsense
-
Hi,
A bit of background on my setup:
I have two pfsense with CARP setup between them and running XMLRPC as well. There is also a multi-wan component, we have two providers, one gives us a single static IP and the other gives us a /28, so 13 useable IPs. On the primary pfsense we have both providers set up, on the secondary pfsense only one is physically connected. If the primary pfsense fails we have to manually connect the second provider if we want to maintain both providers. I should also state that we don't use the multi-wan setup for load-balancing, but rather for failover.
The primary pfsense was upgraded from 2.3.3, the secondary pfsense was a clean install of 2.3.4. I first noticed this issue on the secondary firewall last week, after maybe a month of running on 2.3.4.
On the primary pfsense I am able to retrieve the current package information and the list of available package information, but on the secondary pfsense I am not.
Via the shell, I get the following:
[2.3.4-RELEASE][admin@georgia.vitals.healthcare]/root: pkg update Updating pfSense-core repository catalogue... pkg: Repository pfSense-core load error: access repo file(/var/db/pkg/repo-pfSense-core.sqlite) failed: No such file or directory pkg: https://pkg.pfsense.org/pfSense_v2_3_4_amd64-core/meta.txz: Network is unreachable repository pfSense-core has no meta file, using default settings pkg: https://pkg.pfsense.org/pfSense_v2_3_4_amd64-core/packagesite.txz: Network is unreachable Unable to update repository pfSense-core Updating pfSense repository catalogue... pkg: Repository pfSense load error: access repo file(/var/db/pkg/repo-pfSense.sqlite) failed: No such file or directory pkg: https://pkg.pfsense.org/pfSense_v2_3_4_amd64-pfSense_v2_3_4/meta.txz: Network is unreachable repository pfSense has no meta file, using default settings pkg: https://pkg.pfsense.org/pfSense_v2_3_4_amd64-pfSense_v2_3_4/packagesite.txz: Network is unreachable Unable to update repository pfSense Error updating repositories!
If I do nslookup on the primary pfsense with the record type set to srv I get:
[2.3.4-RELEASE][admin@michigan.vitals.healthcare]/root: nslookup > set type=srv > _https._tcp.pkg.pfsense.org Server: 127.0.0.1 Address: 127.0.0.1#53 Non-authoritative answer: _https._tcp.pkg.pfsense.org service = 10 10 443 files01.netgate.com. _https._tcp.pkg.pfsense.org service = 10 10 443 files00.netgate.com. Authoritative answers can be found from: pfsense.org nameserver = ns1.netgate.com. pfsense.org nameserver = ns2.netgate.com. >
This looks good to me.
However on the secondary I get something different:
[2.3.4-RELEASE][admin@georgia.vitals.healthcare]/root: nslookup > set type=srv > _https._tcp.pkg.pfsense.org ;; Got SERVFAIL reply from 127.0.0.1, trying next server Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: _https._tcp.pkg.pfsense.org service = 10 10 443 files01.netgate.com. _https._tcp.pkg.pfsense.org service = 10 10 443 files00.netgate.com. Authoritative answers can be found from: >
This looks quite different. It is using google, instead of the local host and it is not finding authoritative answers.
If I try to force it to use the local dns server (127.0.0.1) it fails:
[2.3.4-RELEASE][admin@georgia.vitals.healthcare]/root: nslookup > server 127.0.0.1 Default server: 127.0.0.1 Address: 127.0.0.1#53 > set type=srv > _https._tcp.pkg.pfsense.org Server: 127.0.0.1 Address: 127.0.0.1#53 ** server can't find _https._tcp.pkg.pfsense.org: SERVFAIL >
I'm trying to identify where the issue is coming from. I'm fairly new to pfsense so I'm not sure what the correct expectations are. Should the nslookup be the same on both pfsense devices?
The system logs show that dns resolver (unbound) is restarting every few minutes:
Jul 3 12:37:50 unbound 30721:0 notice: Restart of unbound 1.6.1. Jul 3 12:37:52 unbound 30721:0 notice: Restart of unbound 1.6.1. Jul 3 12:41:42 unbound 30721:0 notice: Restart of unbound 1.6.1. Jul 3 12:41:44 unbound 30721:0 notice: Restart of unbound 1.6.1.
The last restart on the primary pfsense was on June 27th after I performed a package upgrade for openvpn CVE.
So possibly there is an issue with unbound itself on the secondary pfsense? If so, how would I troubleshoot further? Or am I barking up the wrong tree altogether?
Thanks,
Shane -
"Network is unreachable" sounds like the default gateway aint working properly..
Can you "ping 8.8.8.8" ? Or check under diagnostics/routes if a default is there? -
Check your outbound NAT rules. Odds are you have a rule with a source of ANY or another rule which NATs all outbound traffic – including the firewall's own traffic -- to a CARP VIP. Which is not a correct configuration.
Change the rules to match only your local/private networks as a source network or alias.
-
Thanks PiBa for the suggestion.
"Network is unreachable" sounds like the default gateway aint working properly..
Can you "ping 8.8.8.8" ? Or check under diagnostics/routes if a default is there?I was able to ping 8.8.8.8 and when I manually set 8.8.8.8 as a resolving server for other hostnames I get a response. I also confirmed that routes are listing a default route. So it doesn't seem to be related to routing.
-
Thanks jimp as well for the suggestion.
Check your outbound NAT rules. Odds are you have a rule with a source of ANY or another rule which NATs all outbound traffic – including the firewall's own traffic -- to a CARP VIP. Which is not a correct configuration.
Change the rules to match only your local/private networks as a source network or alias.
We are using manual outbound nat rules, I believe this was enabled due to failover with gateway groups. However, every entry has a specific source, and a destination pointing to either the VIP for the wan interface of one provider, or the IP of the other provider. I've attached a screenshot showing the rules.
-
It may or may not affect this, but your 127.0.0.0/8 NAT rules definitely should NOT be using a NAT address of a CARP VIP, set those to be an interface address.
-
Thanks for the quick reply. I updated the outbound rules as suggested, it didn't unfortunately resolve the issue. To the point you're raising though, should only the LAN entries have a translation address of the CARP VIP? Does this include the networks used for VPNs as well? Off-topic I know so no problem if you'd rather not answer that here. Thanks.
-
Anything that requires NAT from local networks outbound should have NAT rules mapping it to a CARP VIP. If you have any public/routable networks those shouldn't have NAT applied either.
The only other exclusion should be traffic from the firewall itself (its own WAN interface addresses, 127.x.x.x), which needs to leave without NAT, or you'll get the exact problems you describe. When your particular problem scenario happens, it's almost always NAT.
So if it isn't NAT, then double check your routing and DNS. Make sure the secondary has a default gateway set and that it shows as default under Diagnostics > Routes. Also make sure if that if you have IPv6 configured in any capacity that it is fully configured and operational, as if it's partially configured the firewall could be trying and failing to use IPv6.
-
Thanks jimp, that was it. The default gateway was pointed to the wan interface that's down. I guess I misunderstood the use of gateway groups and thought that if the primary in a gateway is down then it would automatically use the gateway that is up. Does that require setting the option in System -> Advaned -> Miscellaneous -> "Default Gateway Switching"? From what I understood, this wasn't necessary if enabling gateway groups. Thanks.
-
Gateway groups do not influence traffic from the firewall itself. Not yet at least, there is some work to make it possible to select a default gateway group for use with default gateway switching, so that it can more intelligently choose which gateway to use for the firewall itself.