Rapid DOWN-UP of WAN PHY + IP change causes complete loss of WAN sometimes
-
This has been plaguing me for a long time and now with 2.4.0 stable it's been easier to try to tease out what might be going wrong.
The setup:
LAN <–--> SG-1000 <----> HFC Cable Modem
NOTE: I have zero control over the cable modem as it is controlled by the infrastructure provider (NBN Australia).The problem is that Internet connectivity will sometimes completely disappear and not come back unless the SG-1000 is power cycled (have not confirmed yet whether it can be accessed via web console when it is in this state).
From studying what happens in multiple logs it seems that:
-
ISP sets WAN IP via DHCP with 150 second renew time.
-
Seemingly randomly (2-3 times a day or once a week) WAN IP is changed.
-
A moment before the IP is changed the cable modem will in under one second force the link to pfsense DOWN then UP.
-
From this point on, it's all bets are off. Either:
a) ~ 50% of the time IP will update and WAN connection stays
b) ~ 50% of the time WAN dies. Internet cannot be accessed from the LAN and also the unit cannot be pinged from the Internet.
In the case where Internet access dies, it seems that the new IP sticks, but some other odd things happen. Combined logs:
07:35:35 dhclient: RENEW 07:35:35 dhclient: Creating resolv.conf 07:38:05 dhclient: RENEW 07:38:05 dhclient: Creating resolv.conf 07:40:35 dhclient: RENEW 07:40:35 dhclient: Creating resolv.conf 07:43:05 dhclient: RENEW 07:43:05 dhclient: Creating resolv.conf 07:45:35 dhclient: RENEW 07:45:35 dhclient: Creating resolv.conf 07:48:05 dhclient: RENEW 07:48:05 dhclient: Creating resolv.conf 07:49:35 kernel: cpsw0: link state changed to DOWN 07:49:35 kernel: cpsw0: link state changed to UP 07:49:35 check_reload_status: Linkup starting cpsw0 07:49:35 check_reload_status: Linkup starting cpsw0 07:49:36 php-fpm[88066]: /rc.linkup: DEVD Ethernet detached event for wan 07:49:36 dhclient[9971]: connection closed 07:49:36 dhclient[9971]: exiting. 07:49:37 miniupnpd[58525]: ioctl(s, SIOCGIFADDR, ...): Can't assign requested address 07:49:37 miniupnpd[58525]: Failed to get IP for interface cpsw0 07:49:37 miniupnpd[58525]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping 07:49:37 check_reload_status: Reloading filter 07:49:37 php-fpm[13091]: /rc.linkup: DEVD Ethernet attached event for wan 07:49:37 php-fpm[13091]: /rc.linkup: HOTPLUG: Configuring interface wan 07:49:38 miniupnpd[58525]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=10): No route to host 07:49:38 dhclient: PREINIT 07:49:38 dhclient[59642]: DHCPREQUEST on cpsw0 to 255.255.255.255 port 67 07:49:38 dhclient[59642]: DHCPNAK from ##ISP GATEWAY## 07:49:38 dhclient[59642]: DHCPDISCOVER on cpsw0 to 255.255.255.255 port 67 interval 1 07:49:39 miniupnpd[58525]: ioctl(s, SIOCGIFADDR, ...): Can't assign requested address 07:49:39 miniupnpd[58525]: Failed to get IP for interface cpsw0 07:49:39 miniupnpd[58525]: SendNATPMPPublicAddressChangeNotification: cannot get public IP address, stopping 07:49:39 miniupnpd[58525]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=10): No route to host 07:49:39 dhclient[59642]: DHCPDISCOVER on cpsw0 to 255.255.255.255 port 67 interval 2 07:49:39 dhclient[59642]: DHCPOFFER from ##ISP GATEWAY## 07:49:39 dhclient: ARPSEND 07:49:39 dhclient[59642]: DHCPOFFER from ##ISP GATEWAY## 07:49:39 dhclient[59642]: DHCPOFFER already seen. 07:49:39 dhclient[59642]: DHCPOFFER from ##ISP GATEWAY 2## 07:49:39 dhclient: ARPSEND 07:49:39 dhclient: ARPCHECK 07:49:39 dhclient[59642]: DHCPREQUEST on cpsw0 to 255.255.255.255 port 67 07:49:39 dhclient[59642]: DHCPACK from ##ISP GATEWAY## 07:49:39 dhclient: BOUND 07:49:39 dhclient: Starting add_new_address() 07:49:39 dhclient: ifconfig cpsw0 inet ##WAN IP## netmask 255.255.0.0 broadcast ##WAN BROADCAST## 07:49:39 dhclient: New IP Address (cpsw0): ##WAN IP## 07:49:39 dhclient: New Subnet Mask (cpsw0): 255.255.0.0 07:49:39 dhclient: New Broadcast Address (cpsw0): ##WAN BROADCAST## 07:49:39 dhclient: New Routers (cpsw0): ##ISP GATEWAY## 07:49:39 dhclient: Adding new routes to interface: cpsw0 07:49:40 dnsmasq[97316]: exiting on receipt of SIGTERM 07:49:40 dhclient: /sbin/route add default ##ISP GATEWAY## 07:49:40 dhclient: Creating resolv.conf 07:49:40 dhclient[59642]: bound to ##WAN IP## -- renewal in 150 seconds. 07:49:40 check_reload_status: rc.newwanip starting cpsw0 07:49:40 php-fpm[13091]: /rc.linkup: ROUTING: setting default route to ##ISP GATEWAY## 07:49:40 check_reload_status: Restarting ipsec tunnels 07:49:41 dhcpleases: /etc/hosts changed size from original! 07:49:41 dhcpleases: Sending HUP signal to dns daemon(97316) 07:49:41 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:41 kernel: arpresolve: can't allocate llinfo for ##ISP GATEWAY 2## on cpsw0 07:49:41 php-fpm[88066]: /rc.newwanip: rc.newwanip: Info: starting on cpsw0. 07:49:41 kernel: arpresolve: can't allocate llinfo for ##ISP GATEWAY 2## on cpsw0 07:49:41 php-fpm[88066]: /rc.newwanip: rc.newwanip: on (IP address: ##WAN IP##) (interface: WAN[wan]) (real interface: cpsw0). 07:49:41 dhcpleases: /etc/hosts changed size from original! 07:49:41 php-fpm[88066]: /rc.newwanip: IP Address has changed, killing states on former IP Address 106.71.88.7. 07:49:41 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:42 dhcpleases: /etc/hosts changed size from original! 07:49:42 dhcpleases: Sending HUP signal to dns daemon(97316) 07:49:42 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:42 dhcpleases: kqueue error: unkown 07:49:42 dhcpleases: Sending HUP signal to dns daemon(71274) 07:49:42 dnsmasq[71274]: started, version 2.78 cachesize 10000 07:49:42 dnsmasq[71274]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect no-inotify 07:49:42 dnsmasq[71274]: reading /etc/resolv.conf 07:49:42 dnsmasq[71274]: using nameserver ##ISP DNS###53 07:49:42 dnsmasq[71274]: using nameserver ##ISP DNS 2###53 07:49:42 dnsmasq[71274]: read /etc/hosts - 2 addresses 07:49:42 dnsmasq[71274]: read /etc/hosts - 4 addresses 07:49:42 dhcpleases: /etc/hosts changed size from original! 07:49:42 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:42 dhcpleases: kqueue error: unkown 07:49:53 rc.gateway_alarm[85709]: >>> Gateway alarm: WAN_DHCP (Addr:##ISP GATEWAY 2## Alarm:1 RTT:16620ms RTTsd:2417ms Loss:22%) 07:49:53 check_reload_status: updating dyndns WAN_DHCP 07:49:53 check_reload_status: Restarting ipsec tunnels 07:49:53 check_reload_status: Restarting OpenVPN tunnels/interfaces 07:49:53 check_reload_status: Reloading filter
After all that the IP lease is renew via DHCP although the behaviour is a bit different:
07:52:09 dhclient: Creating resolv.conf 07:52:09 dhclient[66767]: bound to ##WAN IP## -- renewal in 150 seconds. 07:54:39 dhclient[66767]: DHCPREQUEST on cpsw0 to ##ISP GATEWAY## port 67 07:54:40 dhclient[66767]: DHCPACK from ##ISP GATEWAY##
This is a remote site so it is hard to debug, but I thought of trying to maybe recreate the issue by:
- Unplugging then replugging the WAN cable
- Power cycling the cable modem with a 1 minute off time
I've found several issues and threads which are similar but the solution either didn't work for me or it's been patched. The fact that it is 50/50 makes me wonder if this is some race condition in the rc.linkup and/or rc.newwanip scripts which pop up in the logs.
Any ideas, thoughts on what it might be or other ways I can dig up information would be greatly appreciated.
Thanks
EDIT 1:
- I get the same behaviour with unbound instead of dnsmasq
- Internet does not go down with the ISP supplied crappy router
EDIT 2:
Currently running 2.4.0-RELEASE (arm) -
-
This is the system log of the fail event but also up until the unit was power cycled. I noticed further down shortly before the power cycle several of these messages:
08:26:06 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed
08:26:15 kernel: cpsw1: link state changed to DOWN
08:26:15 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed
08:26:27 kernel: cpsw1: link state changed to UPwhere cpsw1 is the LAN
Also:
08:31:37 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed
08:31:37 kernel: cpsw0: link state changed to DOWN
08:31:37 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed
08:31:40 kernel: cpsw0: link state changed to UP
08:31:40 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failedwhere cpsw0 is the WAN.
Didn't really take note of it earlier because I was so focused on the messages at the failure time, but maybe they mean something. There's not a lot of information on the net about these messages.
Full system.log from start of failure until power cycle:
07:49:35 kernel: cpsw0: link state changed to DOWN 07:49:35 kernel: cpsw0: link state changed to UP 07:49:35 check_reload_status: Linkup starting cpsw0 07:49:35 check_reload_status: Linkup starting cpsw0 07:49:36 php-fpm[88066]: /rc.linkup: DEVD Ethernet detached event for wan 07:49:37 check_reload_status: Reloading filter 07:49:37 php-fpm[13091]: /rc.linkup: DEVD Ethernet attached event for wan 07:49:37 php-fpm[13091]: /rc.linkup: HOTPLUG: Configuring interface wan 07:49:40 check_reload_status: rc.newwanip starting cpsw0 07:49:40 php-fpm[13091]: /rc.linkup: ROUTING: setting default route to ###ISP GATEWAY 1### 07:49:40 check_reload_status: Restarting ipsec tunnels 07:49:41 kernel: arpresolve: can't allocate llinfo for ###ISP GATEWAY 2### on cpsw0 07:49:41 php-fpm[88066]: /rc.newwanip: rc.newwanip: Info: starting on cpsw0. 07:49:41 kernel: arpresolve: can't allocate llinfo for ###ISP GATEWAY 2### on cpsw0 07:49:41 php-fpm[88066]: /rc.newwanip: rc.newwanip: on (IP address: ###NEW WAN IP###) (interface: WAN[wan]) (real interface: cpsw0). 07:49:41 dhcpleases: /etc/hosts changed size from original! 07:49:41 php-fpm[88066]: /rc.newwanip: IP Address has changed, killing states on former IP Address 106.71.88.7. 07:49:41 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:42 dhcpleases: /etc/hosts changed size from original! 07:49:42 dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/dnsmasq.pid) does not exist, No such process. 07:49:42 dhcpleases: kqueue error: unkown 07:49:53 rc.gateway_alarm[85709]: >>> Gateway alarm: WAN_DHCP (Addr:###ISP GATEWAY 2### Alarm:1 RTT:16620ms RTTsd:2417ms Loss:22%) 07:49:53 check_reload_status: updating dyndns WAN_DHCP 07:49:53 check_reload_status: Restarting ipsec tunnels 07:49:53 check_reload_status: Restarting OpenVPN tunnels/interfaces 07:49:53 check_reload_status: Reloading filter 07:51:36 sshd[82454]: Timeout, client not responding. 07:59:46 check_reload_status: Linkup starting cpsw1 07:59:46 kernel: cpsw1: link state changed to DOWN 08:00:06 kernel: cpsw1: link state changed to UP 08:00:06 check_reload_status: Linkup starting cpsw1 08:00:15 check_reload_status: Linkup starting cpsw1 08:00:15 kernel: cpsw1: link state changed to DOWN 08:00:28 check_reload_status: Linkup starting cpsw1 08:00:28 kernel: cpsw1: link state changed to UP 08:25:43 kernel: cpsw1: link state changed to DOWN 08:25:43 check_reload_status: Linkup starting cpsw1 08:26:06 kernel: cpsw1: link state changed to UP 08:26:06 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:26:15 kernel: cpsw1: link state changed to DOWN 08:26:15 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:26:27 kernel: cpsw1: link state changed to UP 08:26:27 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:30:51 kernel: cpsw1: link state changed to DOWN 08:30:51 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:30:55 kernel: cpsw0: link state changed to DOWN 08:30:55 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:14 kernel: cpsw1: link state changed to UP 08:31:14 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:20 kernel: cpsw1: link state changed to DOWN 08:31:20 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:25 kernel: cpsw0: link state changed to UP 08:31:25 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:37 kernel: cpsw1: link state changed to UP 08:31:37 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:37 kernel: cpsw0: link state changed to DOWN 08:31:37 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:40 kernel: cpsw0: link state changed to UP 08:31:40 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:31:43 kernel: cpsw0: link state changed to DOWN 08:31:43 kernel: sonewconn: pcb 0xc3abe840: pru_attach() failed 08:33:08 syslogd: kernel boot file is /boot/kernel/kernel 08:33:08 kernel: Copyright (c) 1992-2017 The FreeBSD Project. 08:33:08 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
[
-
Does your wan Access come back if you power cycle your cable modem without rebooting PFSense? I have, what may be, a similar issue but I can restore internet by rebooting only
The cable modem. It started the day after I upgraded to 2.4 and unfortunately it is still randomly happening after 2.4.1 - I’m considering rolling back for a while until the dust settles more on 2.4