"Tailscale is not online" problem
-
@chudak said in "Tailscale is not online" problem:
Do you know by chance how to catch related to TS stop/start errors/warnings in the logs?
You can search old system logs in /var/log directory if I'm not mistaken. Shouldn't be hard to find it, I mean, if problem started yesterday at 15:00hrs, so there you go.
-
TS in conjunction with pfS instability is really frustrating
I’m travelling now and TS is simply down with the error:Error executing command (/usr/local/bin/tailscale status)
Health check:
- not logged in, last login error=invalid key: API key does not exist
unexpected state: NoState
So far nothing I’ve done, rebooting, deleting a lock file, nothing helped.
Thx G.. I still have OpenVPN as a backup option.
I’m surprised no one else is complaining about this… -
@chudak did you create the key at the tailscale's console and then imported it to pfsense ?
also, set the key to do not expire. -
@mcury said in "Tailscale is not online" problem:
@chudak did you create the key at the tailscale's console and then imported it to pfsense ?
also, set the key to do not expire.I did set my key to not expire.
I have used the original key with pfS and did not regenerate it
And have seen it’s working normally after this error.
So suspecting it’s unrelatedI’m hesitant to mess up with keys now as it used to work literally two days ago.
-
@chudak said in "Tailscale is not online" problem:
@mcury said in "Tailscale is not online" problem:
@chudak did you create the key at the tailscale's console and then imported it to pfsense ?
also, set the key to do not expire.I did set my key to not expire.
I have used the original key with pfS and did not regenerate it
And have seen it’s working normally after this error.
So suspecting it’s unrelatedI’m hesitant to mess up with keys now as it used to work literally two days ago.
Ok, if the problem happens again, try to create a key, import it, and then set it to "don't expire".
-
I want to improve the script above to make it "force" direct connections.
Another issue with this script is that its pinging only once and if that ping fails, it stops and then starts the service.
I think it would be much better if the script pings 10 times, and if 10 out of 10 fails, it will restart the service.
This would increase the reliability of the script and also in the same time, make connections leave the relay and connect directly.But I'm failing to do so, any ideas to improve the code with the insights above in mind ?
Edit:
I think I got it..
1- It will ping "headquarters" 10 times using tailscale.
This will help connections through tailscale prefer "direct" instead of relay.
2- If at least one of the tailscale ping works, it won't do anything.
This will avoid the service to being brought down every time.
3- If all pings fails, it will restart the tailscale service.#!/bin/sh DEST="headquarters" SUCCESS=0 COUNT=0 while [ $COUNT -le 9 ] do for DEST in $DEST do COUNT=`expr $COUNT + 1` tailscale ping --c 1 -timeout 1s $DEST >/dev/null 2>/dev/null # ping -c 1 -t 100 $DEST if [ $? -eq 0 ] then SUCCESS=`expr $SUCCESS + 1` fi done done if [ $SUCCESS -ge 1 ] && [ $COUNT -eq 10 ] then exit 0 else /usr/local/sbin/pfSsh.php playback svc stop tailscale sleep 5 /usr/local/sbin/pfSsh.php playback svc start tailscale sleep 5 exit 1 fi done
One important observation is, if there are more peers in the tailscale network, you can and should add them to this script.
See, if you are only pinging one host, if that host goes down, the script will take the entire tailscale service down affecting other hosts.Code for multiple hosts
#!/bin/sh DEST="server-1" DEST1="server-2" DEST2="servier-3" SUCCESS=0 COUNT=0 while [ $COUNT -le 9 ] do for DEST in $DEST do COUNT=`expr $COUNT + 1` tailscale ping --c 1 --timeout 1s $DEST >/dev/null 2>/dev/null # ping -c 1 -t 100 $DEST if [ $? -eq 0 ] then SUCCESS=`expr $SUCCESS + 1` fi tailscale ping --c 1 --timeout 1s $DEST1 >/dev/null 2>/dev/null # ping -c 1 -t 100 $DEST1 if [ $? -eq 0 ] then SUCCESS=`expr $SUCCESS + 1` fi tailscale ping --c 1 --timeout 1s $DEST2 >/dev/null 2>/dev/null # ping -c 1 -t 100 $DEST2 if [ $? -eq 0 ] then SUCCESS=`expr $SUCCESS + 1` fi done done if [ $SUCCESS -ge 1 ] && [ $COUNT -eq 10 ] then exit 0 else /usr/local/sbin/pfSsh.php playback svc stop tailscale sleep 5 /usr/local/sbin/pfSsh.php playback svc start tailscale sleep 5 exit 1 fi done
The code above will sum SUCCESS variable, and if any of the hosts answers, tailscale service will be considered to be UP and no actions will be taken.
-
I'm late to the party but unless I misunderstand this thread it's not about Tailscale not starting up but instead about the auth key expiring.
Auth keys are good for a maximum of 90 days. If you reboot pfSense on day 91, Tailscale will not come up and the "API" error will be generated (it's actually an auth key expired error).
Thus, unless you never reboot pfSense, starting with the 91st day, you must re-generate an auth key and input it to Tailscale even if you have key expiry disabled.
What makes this worse, IMHO, is that the longer you go between reboots, the more obscure the problem. So, Tailscale is not a reliable service because it cannot survive a reboot after 90 days.
This occurs on both CE 2.7.2 in a Protectli Vault Proxmox VM and on a real SG-1100 running Plus 24.11 (packages as distributed with those releases).
-
@yobyot said in "Tailscale is not online" problem:
I'm late to the party but unless I misunderstand this thread it's not about Tailscale not starting up but instead about the auth key expiring.
Auth keys are good for a maximum of 90 days. If you reboot pfSense on day 91, Tailscale will not come up and the "API" error will be generated (it's actually an auth key expired error).
Thus, unless you never reboot pfSense, starting with the 91st day, you must re-generate an auth key and input it to Tailscale even if you have key expiry disabled.
What makes this worse, IMHO, is that the longer you go between reboots, the more obscure the problem. So, Tailscale is not a reliable service because it cannot survive a reboot after 90 days.
This occurs on both CE 2.7.2 in a Protectli Vault Proxmox VM and on a real SG-1100 running Plus 24.11 (packages as distributed with those releases).
I wish TS would introduce a new feature "a la Acme update" so that this would be done automatically.
-
@yobyot I think maybe it is node key expiring at 180 days?
fwiw I have discovered that running in shell
tailscale down tailscale up --force-reauth
will give you a URL you can then paste in browser and it re authenticates and gets pfsense back online as the same machine and status shows this in pf tailscale UI. This is reauthing the node key.
The node key shouldn't expire when you set it not to on the tailscale admin but I just caught is note on https://tailscale.com/kb/1028/key-expiry
"A change to the Key Expiry value applies to any devices that are logged in after you make the change. The key expiration for any devices that are already logged in remains unchanged, until the next time the device is logged in."So maybe when we setup pf tailscale and the subsequently disable node key expiring it doesn't take effect until reauth which maybe doesnt happen until
--force-reauth
and doesn't become apparant until after 180 days?However, what I don't understand and undermines my theory somewhat is that after doing reauth, at first I noticed that restarting tailscale in pf UI caused me to be logged out again with error "You are logged out. The last login error was: invalid key: API key does not exist"
I manually updated tailscale to 1.84.2 (see https://forum.netgate.com/topic/174525/how-to-update-to-the-latest-tailscale-version/155 but basically run
pkg add -f https://pkg.freebsd.org/FreeBSD:15:amd64/latest/All/tailscale-1.84.2.pkg
) and then did tailscale down and up --force-reauth and this time it made me resign (I have tailnet lock on) after auth it. Now restarting the service in UI works.Not sure yet what is going on and what role the new tailscale pkg played. One thing I suspect that maybe also is a factor is the fact that
/usr/local/pkg/tailscale/state/tailscaled.state
is the state file with node key instead of the standard/var/db/tailscale/tailscaled.state
could be a factor On pf tailscale,/usr/local/etc/rc.d/tailscaled
uses/var/db/tailscale/tailscaled.state
as state directory so maybe sometimes somehow tailscale is looking for state there and it doesn't exist.But this wouldn't really explain why everything is fine for a while initially (usually 90 to 180 days Im not exactly sure in my case). This may explain why it logs out on reboot if you if you use ram disk though.
-
@chudak Yup. That's exactly what we need for this to be reliable.