SG-3100 rebooting
-
SG-3100, updated 2 months ago to 22.05. I can't reach the web interface, so this is from memory:
Packages:
- Cron
- openvpn-client-export
- Patch
There is a Cron job which pings a list of IPs, and if all are unavailable takes the WAN interface DOWN/UP.
I have 3 OVPN servers and 1 OVPN client, which is connected to a tunnel to Private Internet Access.
A week ago I noticed that I cannot access the web interface. Since then the unit reboots periodically, and with increasing frequency (sometimes every 30 min). It routes, but the traffic speed over the PIA tunnel is very unstable, and traffic speed in general is somewhat unstable.
I retrieved the system logs via console, but am unsure how to proceed. Here are some snippets:
Nov 3 02:31:16 gateway kernel: ---<<BOOT>>---
.
Nov 3 02:32:39 gateway php[403]: rc.bootup: dpinger: status socket /var/run/dpinger_OVPN10194_USER_VPNV4~10.200.3.1~10.200.3.1.sock not found
Nov 3 02:32:39 gateway php[403]: rc.bootup: dpinger: status socket /var/run/dpinger_OVPN443_USER_VPNV4~10.200.1.1~10.200.1.1.sock not found
Nov 3 02:32:39 gateway php[403]: rc.bootup: dpinger: status socket /var/run/dpinger_OVPN9194_ADMIN_VPNV4~10.200.2.1~10.200.2.1.sock not found
Nov 3 02:32:39 gateway kernel: ....
Nov 3 02:32:40 gateway kernel: .
Nov 3 02:32:40 gateway kernel: done.
Nov 3 02:32:40 gateway kernel: done.
Nov 3 02:32:41 gateway php[403]: rc.bootup: Gateway, NONE AVAILABLE
Nov 3 02:32:41 gateway kernel: done.
Nov 3 02:32:42 gateway php[403]: rc.bootup: sync unbound done.
Nov 3 02:32:42 gateway kernel: done.
Nov 3 02:32:42 gateway php[403]: rc.bootup: There is something wrong in the config because user password is missing!
Nov 3 02:32:42 gateway php[403]: rc.bootup: The command '/usr/sbin/pw groupdel 'OVPN'' returned exit code '65', the output was 'pw: unknown group `OVPN''
Nov 3 02:32:42 gateway kernel: done.
.
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: still could not bind()
Nov 3 02:33:12 gateway php[403]: rc.bootup: The command '/usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf' returned exit code '1', the output was 'nginx: [emerg] bind() to 0.0.0.0:443 failed (48: Address already in use) nginx: [emerg] bind() to 0.0.0.0:443 failed (48: Address already in use) nginx: [emerg] bind() to 0.0.0.0:443 failed (48: Address already in use) nginx: [emerg] bind() to 0.0.0.0:443 failed (48: Address already in use) nginx: [emerg] bind() to 0.0.0.0:443 failed (48: Address already in use) nginx: [emerg] still could not bind()'
Nov 3 02:33:12 gateway kernel: failed!And then there is a whole lot of the below, complaining about all IP ranges the unit uses (WAN, OVPN servers) and many it doesn't!
Nov 3 02:36:55 gateway php-fpm[385]: /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 10.200.2.1 -> 10.200.2.1 - Restarting packages.
Nov 3 02:36:55 gateway check_reload_status[399]: Starting packages
Nov 3 02:36:56 gateway php-fpm[80374]: /rc.newwanip: rc.newwanip: Info: starting on ovpnc4.
Nov 3 02:36:56 gateway php-fpm[80374]: /rc.newwanip: rc.newwanip: on (IP address: 10.30.110.95) (interface: PIA[opt26]) (real interface: ovpnc4).
Nov 3 02:36:56 gateway php-fpm[682]: /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 84.73.89.100 -> 84.73.89.100 - Restarting packages.
Nov 3 02:36:56 gateway check_reload_status[399]: Starting packages... and then it will reboot again.
Obviously the "there is something wrong in the config" is concerning. But what is the best way to proceed? I do have a backup of the config.
Thank you!
Edit: I happen to have another SG-3100 here. I just tried loading my backed-up config into it and now I also cannot access the web interface. And when I connect over the console and select option 11 "Restart webConfigurator" I get:
gateway nginx: 2022/11/03 10:01:07 [emerg] 14448#100192: bind() to 0.0.0.0:443 failed (48: Address already in use)
Maybe the config is bad?
-
@axxxxe said in SG-3100 rebooting:
gateway nginx: 2022/11/03 10:01:07 [emerg] 14448#100192: bind() to 0.0.0.0:443 failed (48: Address already in use)
Warnings like that are not necessarily fatal. The webgui tries to listen on all interfaces and it looks like at least two of them don't have an address yet so it fails on the second attach. But I would expect it to still be listening on the interfaces that do have an IP.
I would guess that user password issue is one of the OpenVPN clients set to expect a username/password and missing it.
Steve
-
@stephenw10
Can you suggest how I can diagnose the webGUI not working? It is not available on either of the two interfaces that allow it. -
Can you access those interfaces otherwise? Ping, ssh etc?
What interfaces are shown as failing to get an IP at the console?
-
@stephenw10
I can ping and SSH on both interfaces. No interfaces but the OVPN tunnel fail to get an IP. -
Can you see nginx running?
Check the nginx logs in /var/log/nginx.log
-
@stephenw10 said in SG-3100 rebooting:
/var/log/nginx.log
The only entry in that log from today is:
Nov 3 02:33:09 gateway nginx: 2022/11/03 02:33:09 [emerg] 56330#100178: bind() to 0.0.0.0:443 failed (48: Address already in use)
... and the unit has rebooted itself several times today.
Edit: My mistake, there are actually several lines like the above for today. All messages from today are failures to bind.
-
Anything in /var/log/nginx/error.log ?
And do you see nginx running?
[22.05-RELEASE][root@1100-2.stevew.lan]/root: ps -aux | grep nginx root 433 0.0 4.3 137168 43232 - I 15:04 0:34.49 php-fpm: pool nginx (php-fpm) root 434 0.0 4.3 137276 43012 - I 15:04 0:35.13 php-fpm: pool nginx (php-fpm) root 643 0.0 4.3 136776 43240 - I 15:04 0:57.10 php-fpm: pool nginx (php-fpm) root 653 0.0 4.3 133960 42920 - I 15:04 0:58.32 php-fpm: pool nginx (php-fpm) root 22487 0.0 0.8 27488 7632 - Is 15:14 0:00.00 nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf (nginx) root 22544 0.0 0.8 28400 8480 - I 15:14 0:04.04 nginx: worker process (nginx) root 22799 0.0 0.9 28480 8552 - I 15:14 0:02.79 nginx: worker process (nginx) root 15452 0.0 0.2 11000 2504 u0 S+ 16:24 0:00.01 grep nginx
-
@stephenw10
[22.05-RELEASE][root@gateway.home.localdomain]/var/log: ps -aux | grep nginx
root 385 0.0 1.5 93592 30708 - I 05:00 0:14.08 php-fpm: pool nginx (php-fpm)
root 386 0.0 1.5 93620 30672 - I 05:00 0:14.14 php-fpm: pool nginx (php-fpm)
root 682 0.0 1.5 93584 31152 - I 05:00 0:14.33 php-fpm: pool nginx (php-fpm)
root 12681 0.0 1.5 93644 30752 - I 05:06 0:13.39 php-fpm: pool nginx (php-fpm)
root 19185 0.0 1.5 93592 30700 - I 05:06 0:14.04 php-fpm: pool nginx (php-fpm)
root 82767 0.0 1.5 93584 30564 - I 05:06 0:12.51 php-fpm: pool nginx (php-fpm)
root 94004 0.0 1.5 93604 30268 - I 05:06 0:12.32 php-fpm: pool nginx (php-fpm)
root 1237 0.0 0.1 4756 2292 u0 S+ 17:25 0:00.01 grep nginx/var/log/nginx/error.log has no entries since April of 2021.
-
Hmm, so not running.
Well I would look at the OpenVPN client initially. If it's an assigned interface but never gets created that could be causing it. Does it show in ifconfig?
-
@stephenw10
ovpnc4: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
inet6 fe80::208:a2ff:fe0d:eb7e%ovpnc4 prefixlen 64 scopeid 0x28
inet 10.14.110.78 --> 10.14.110.1 netmask 0xffffff00
groups: tun openvpn
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Opened by PID 95554Without the WebGUI I am lost as far as manipulating the config...
-
Hmm, that looks fine. So are any interfaces shown as 0.0.0.0?
-
@stephenw10
No.[22.05-RELEASE][root@gateway.home.localdomain]/var/log: ifconfig | grep 'inet '
inet 10.123.123.1 netmask 0xffffff00 broadcast 10.123.123.255
inet 84.73.89.100 netmask 0xfffff800 broadcast 255.255.255.255
inet 127.0.0.1 netmask 0xff000000
inet 192.168.5.1 netmask 0xffffff00 broadcast 192.168.5.255
inet 192.168.4.1 netmask 0xffffff00 broadcast 192.168.4.255
inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
inet 192.168.100.1 netmask 0xffffff00 broadcast 192.168.100.255
inet 192.168.101.1 netmask 0xffffff00 broadcast 192.168.101.255
inet 192.168.102.1 netmask 0xffffff00 broadcast 192.168.102.255
inet 192.168.103.1 netmask 0xffffff00 broadcast 192.168.103.255
inet 192.168.104.1 netmask 0xffffff00 broadcast 192.168.104.255
inet 192.168.105.1 netmask 0xffffff00 broadcast 192.168.105.255
inet 192.168.106.1 netmask 0xffffff00 broadcast 192.168.106.255
inet 192.168.107.1 netmask 0xffffff00 broadcast 192.168.107.255
inet 172.27.0.1 netmask 0xffffff00 broadcast 172.27.0.255
inet 192.168.108.1 netmask 0xffffff00 broadcast 192.168.108.255
inet 192.168.109.1 netmask 0xffffff00 broadcast 192.168.109.255
inet 192.168.110.1 netmask 0xffffff00 broadcast 192.168.110.255
inet 192.168.111.1 netmask 0xffffff00 broadcast 192.168.111.255
inet 192.168.112.1 netmask 0xffffff00 broadcast 192.168.112.255
inet 192.168.113.1 netmask 0xffffff00 broadcast 192.168.113.255
inet 192.168.114.1 netmask 0xffffff00 broadcast 192.168.114.255
inet 192.168.115.1 netmask 0xffffff00 broadcast 192.168.115.255
inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255
inet 192.168.11.1 netmask 0xffffff00 broadcast 192.168.11.255
inet 192.168.12.1 netmask 0xffffff00 broadcast 192.168.12.255
inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
inet 10.200.2.1 --> 10.200.2.2 netmask 0xffffff00
inet 10.200.3.1 --> 10.200.3.2 netmask 0xffffff00
inet 10.200.1.1 --> 10.200.1.2 netmask 0xffffff00
inet 10.14.110.78 --> 10.14.110.1 netmask 0xffffff00 -
0.0.0.0 can also be a placeholder for 'all IPs'. If it can't bind to it is there something else already listening on port 443? HAProxy perhaps? One of the OpenVPN servers?
What does
sockstat -l
show? -
@stephenw10 said in SG-3100 rebooting:
sockstat -l
[22.05-RELEASE][root@gateway.home.localdomain]/var/log: sockstat -l | grep 443
root dpinger 87220 6 stream /var/run/dpinger_OVPN443_USER_VPNV4~10.200.1.1~10.200.1.1.sock
root openvpn 90098 6 tcp4 84.73.89.100:443 :Yes, there is an OpenVPN server listening on 443. 84.73.89.100 is the WAN interface. Is it bad for me to have set it up this way?
-
@axxxxe said in SG-3100 rebooting:
Is it bad
Yes.
A single server process can listen on a port. Ones started, another server can't use that port.You could use UDP as a protocol for you OpenVPN on port 443.
As nginx, the pfSense web server, uses port 443 protocol TCP, this will be ok.Or move the pfSense web server GUI to another port, like 8080.
Or just leave the OpenVPN on the default 1194 port.
-
@gertjan
Ok, thanks for the helpful info. I find it bizarre that this config worked for years and only in the last 2 weeks suddenly broke the WebGUI.Two questions:
How do I fix this from the console, at least so I can get in over the web interface and adjust the OVPN server settings?
Does this explain the random rebooting of the unit?
-
@axxxxe said in SG-3100 rebooting:
How do I fix this from the console, at least so I can get in over the web interface and adjust the OVPN server settings?
You have the console access working ?
Then easy : edit** the pfSense master config file /cf/conf/confiig.xml
Locate the openvpn server settings, you'll see the port number "443", change that for "1234".
Save.
Reboot.
All is well.There is a command that can help you :
viconfig
(if you know what 'vi' is
)
@axxxxe said in SG-3100 rebooting:
Does this explain the random rebooting of the unit?
The GUI web server is also use to 'execute' many PHP scripts that are needed for the pfSense household tasks.
With the GUI not running, the system will become .... undefined to me. -
I would say it doesn't explain the rebooting.
I also find it odd that it prevented the webgui process starting at all. I would have expected it to be unavailable on the WAN address only. However that is a conflict so I would start out by removing it.
You can use the Easy Editor
ee
instead of vi if you're not a masochist! It's built in.Or you can use the 'Set interface(s) IP address' function from the menu. If you set one of the interfaces to the same address it already has it will ask if you wish to revert the webgui to http on port 80. You can then connect there and make other changes in the gui.
Steve
-
@stephenw10 said in SG-3100 rebooting:
Or you can use the 'Set interface(s) IP address' function from the menu. If you set one of the interfaces to the same address it already has it will ask if you wish to revert the webgui to http on port 80. You can then connect there and make other changes in the gui.
I followed this route. Once in over the WebGUI I just deleted the OVPN server since I didn't actually need it anymore. After a reboot the system came up with far less complaining in the log. Fingers crossed that the issues is resolved.
I have to say that it seems like a bug to me that one can break the system by listening on 443 on the WAN port. This didn't used to be the case. I've had that OVPN server configured to listen on 443 since at least January of 2018 and until recently there was no issue. Note that I only upgraded from 2.4.5 to 22.05 two months ago. Maybe this issue was introduced in 22.05?
Anyway, thank you Steve and Gertjan for the help!