Weekly Loss of WAN after 2.2.2 upgrade
-
Hey all,
Since the upgrade to 2.2.2 I lose my WAN IP roughly once a week. Everything comes back up again when rebooting though.
This is what I am faced with when I log in to troubleshoot:
What log information might be useful to troubleshoot this?
Appreciate any help!
Thanks,
Matt -
What does the system log show? Looks like DHCP isn't able to obtain a lease. dhclient should be logging something relevant.
-
@cmb:
What does the system log show? Looks like DHCP isn't able to obtain a lease. dhclient should be logging something relevant.
Thank you.
At the point this has happened, it has had a negotiated IP on the WAN for - in this case - 6 days and 20 hours, so just about a week. Do you think it is failing at renegotiating?
The log page in the GUI is only showing me last 50 entries, which doesn't go back far enough.
Is there a file in /var/log I can look at instead? What can I grep for? The interface name?
I've tried a few different things (like grepping for dhcp, or for my WAN interface, (em1) etc. etc, but I'm not sure if I'm finding anything relevant.
This comes up repeatedly at about the same time as the problem (or just before?). I've anonymized it to hide my IP but you get the point:
/var/log/gateways.log:Apr 13 13:12:28 home-router apinger: Could not bind socket on address(xxx.xxx.xxx.28) for monitoring address xxx.xxx.xxx.1(WAN_DHCP) with error Can't assign requested address
Does this sound like it might be it?
There are pages upon pages of this same error in a row, one every second.
Appreciate any thoughts!
–Matt
-
'clog /var/log/system.log' will dump the full system.log. Or can increase lines shown on Settings tab under Status>System logs.
The apinger log is a symptom of the problem, it has no IP to bind to. It sounds like DHCP renewals are failing for some reason.
-
@cmb:
'clog /var/log/system.log' will dump the full system.log. Or can increase lines shown on Settings tab under Status>System logs.
The apinger log is a symptom of the problem, it has no IP to bind to. It sounds like DHCP renewals are failing for some reason.
Thank you.
Looked at the system log. This looks relevant:
Jun 4 19:30:42 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1
Again, repeated very often for several pages, as many as 16 times per second at about the time of the issue.
Like before, this is probably a symtom, not the problem itself. Went up to see what the last message was before arpresolve started spamming the log and found this:
Jun 4 19:27:45 home-router check_reload_status: updating dyndns WAN_DHCP Jun 4 19:27:45 home-router check_reload_status: Restarting ipsec tunnels Jun 4 19:27:45 home-router check_reload_status: Restarting OpenVPN tunnels/interfaces Jun 4 19:27:45 home-router check_reload_status: Reloading filter Jun 4 19:27:46 home-router php-fpm[98346]: /rc.dyndns.update: phpDynDNS (an.address.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Jun 4 19:27:47 home-router bandwidthd: DNS timeout for xxx.xxx.xxx.242: This problem reduces graphing performance Jun 4 19:27:47 home-router bandwidthd: DNS timeout for xxx.xxx.xxx.242: This problem reduces graphing performance Jun 4 19:28:24 home-router check_reload_status: Linkup starting em1 Jun 4 19:28:24 home-router kernel: em1: Watchdog timeout -- resetting Jun 4 19:28:24 home-router kernel: em1: Queue(0) tdh = 135, hw tdt = 104 Jun 4 19:28:24 home-router kernel: em1: TX(0) desc avail = 31,Next TX to Clean = 135 Jun 4 19:28:24 home-router kernel: em1: link state changed to DOWN Jun 4 19:28:25 home-router php-fpm[95237]: /rc.linkup: DEVD Ethernet detached event for wan Jun 4 19:28:26 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1 Jun 4 19:28:27 home-router kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.1 on em1
So, it looks like dyndns check runs, followed by something for bandwidthd, and finally the watchdog loses touch with em1, it goes down, and doesn't come up again, followed by the log spams from arpresolve.
It's odd that all these things happen right before my issue. Are all these executed on a regular interval, one after another as part of a cron script? If so, what else is executed as part of that script. Something that could be taking down my em1 interface?
I did a "crontab -e" to try to check, but found it empty… ???
Appreciate any help!
Thanks,
Matt -
Yeah the arpresolve: can't allocate llinfo is a symptom as well. You're losing link for some reason.
this thread:
https://forum.pfsense.org/index.php?topic=81929.0suggests setting the following in /boot/loader.conf.local will fix.
hw.pci.enable_msi=0 hw.pci.enable_msix=0
Reboot after creating that file with those lines (or adding them to the file, if you already have one), and see what that does.
-
Thank you for this suggestion!
I will try it and report back within a couple of weeks.
-
Follow up question:
Will these changes to loader.conf be persistent between web interface invoked automatic upgrades, or am I going to have to keep in mind that I need to make this change every time an upgrade comes out?
Thanks,
Matt -
yes thats what the .local is for ;)
-
Disable your GW monitoring on DHCP connections.
I disabled mine and many things has been better since.
-
Disable your GW monitoring on DHCP connections.
That has nothing at all to do with a NIC losing link. In a default config, gateway monitoring does nothing but log response times.
Will these changes to loader.conf be persistent between web interface invoked automatic upgrades, or am I going to have to keep in mind that I need to make this change every time an upgrade comes out?
That's why you put it in loader.conf.local not loader.conf, .local will never be overwritten by an upgrade.
-
yes thats what the .local is for ;)
@cmb:
That's why you put it in loader.conf.local not loader.conf, .local will never be overwritten by an upgrade.
Thanks guys. Sometimes my Linux experience is a great help in the BSD's. Sometimes it just helps me get in over my head faster.
So many things are similar, but so many are also different.
Appreciate it!
–Matt