sendto error: 65

newUser2pfSense

So I'm having a lot of issues with this even after the second call to my ISP. I'm just wondering, if the internet goes down, pfSense of course will log it and I wouldn't be able to connect, however, will pfSense sense (for lack of a better term) when the internet comes back up and allow connectivity? I'm just trying to narrow down if pfSense could be causing this; I don't believe so. Any ideas anyone?

KOM

Yes, it should reconnect when everything comes back up. There was an issue about DHCP client timeouts that prevented reconnection in some cases.

newUser2pfSense

Luckily most of my equipment is IP based except for a couple of devices. I wrote a Python program to check internet connectivity and write any loss of internet connectivity to a log file so I can show my ISP, not that they really care to see the log file.

When this issue occurs, the pfSense dashboard shows the gateway is offline. Oddly enough, it's only been occurring at night. I wonder why then I have to restart pfSense to reconnect to the internet? Maybe I'm stupid here but wouldn't it stand to reason that if the signal stops outside of my home somewhere, of course I wouldn't be able to get on the internet, and when the signal was restored I would be able to get back on without having to restart pfSense? It's as if the signal is trying to get through pfSense but "something" is keeping it from getting through. Hopefully that makes sense.

KOM

Sounds like that DHCP issue I mentioned earlier. I can't seem to find the post but it had to do with the Advanced config of your WAN interface, and playing with the Protocol timing options. Sorry I can't be more specific.

newUser2pfSense

Thanks for that info. I think I found the post:
https://forum.netgate.com/topic/96923/pfsense-not-recovering-from-wan-failure

I found a few other posts from other sites as well. I'll change the Protocol timing Timeout and Retry times and see what happens.

Interestingly, I wonder why it does it only at night or early morning hours?

newUser2pfSense

So I changed the Protocol timing settings to -
Timeout: 3600
Retry: 3

This didn't work. Any further ideas? Maybe some more favorable settings for the Protocol timing?

KOM

My last remaining suggestion is to do a packet capture (Diagnostics - Packet Capture) when you're having the problem to see what's really going on. That's how the other DHCP timeout error was discovered. Post the cap here and someone can look at it to help you figure out the real issue.

newUser2pfSense

While pfSense was reporting the gateway as being offline, I ran a packet capture on the WAN. There were no packets captured. I may have to run a packet capture overnight since it seems to occur more during the early morning hours, after midnight.

KOM

There should have been something. You left it at the default of WAN, Any protocol, any address?

newUser2pfSense

There was no captures at all. I made sure the settings were WAN, any protocol and any address.

newUser2pfSense

I don't know if there is going to be a solid solution to this or not. However, seeing that the Protocol timing doesn't seem to be working, it would be nice if there was an installable System Package that would monitor the gateway and do something like release/renew automagically when it goes offline. Just a thought.

Troubleshooting continues.

newUser2pfSense

I did a Packet Capture on my WAN but for whatever reason pfSense will not allow me to download it; it's only 130 MB. After trying to download it, pfSense locks up and I have to restart it. If I knew the files path, I could SFTP into pfSense and download it.

In the Status > System Logs > DHCP, I've found a lot of the following entries. My WAN is on igb7.

dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 21
dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 18
dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 11
dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 7

dhclient 92079 send_packet: Host is down

I'm not sure of the relevance of those entries but it seemed to have something to do with what's going on. Maybe not?

stephenw10

The last packet capture is in /root/packetcapture.cap

newUser2pfSense

stephenw10...thanks for the path. I was able to download it to my Ubuntu desktop.

I've imported the cap file into Wireshark. Now, it's not that I haven't used Wireshark in the past, I mean way past, but in this case for the issue I'm trying to resolve, I'm not sure what I should be looking for.

stephenw10

If the pcap shows it is continually broadcasting dhcp requests then it looks like it's something at the other end.

If it stops sending after a while you could be hitting the referenced dhclient bug that required the timeout increase. It will still timeout and fail to restart though if so.

Steve

newUser2pfSense

I wrote a Python program a while ago that goes out and checks for internet connectivity every so many seconds. The program writes to a log file when the internet is unreachable. In this particular instance, the internet became unreachable between 07:06am and 07:07am. I've changed the display times in Wireshark to be able to see the dates and times as we would normally see them.

I'm not sure that I'm seeing any entries that says or describes dhcp. I'm looking in the time frame that internet connectivity was lost, just before and after.

After a search for dhcp traffic (udp.port == 68), the only entries I'm seeing is:
20633 2019-11-29 23:25:15.602566 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
52915 2019-11-30 00:25:15.623471 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
69178 2019-11-30 01:25:15.288596 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
84088 2019-11-30 02:25:15.350459 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
100715 2019-11-30 03:25:15.408722 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
116291 2019-11-30 04:25:15.462404 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
167247 2019-11-30 05:25:15.512665 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
183374 2019-11-30 06:25:15.560319 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f

The times seem to correspond hourly to the Protocol timing Timeout I changed to 3600, 1 hour.

stephenw10

Well those TCP ACK packets look to be replies from some external site so I'd suggest the WAN was not down at that moment.
I expect to see it start to send DHCP requests if the WAN went down, unless the link actually failed...

newUser2pfSense

Sorry, I just changed my reply, up two posts. Those were the only DHCP entries I could find. Sorry for the confusion.

newUser2pfSense

I just found this post which may be a temporary answer. I haven't read all of it yet.
https://redmine.pfsense.org/issues/9267

After reading, it looks a little too complicated for me to try . Looks like I'll have to wait for the next pfSense release for this to be fixed.

stephenw10

Yup, that's the bug we are referring to. The dhcp client fails to handle that error correctly and instead of simply retrying until it gets a reply it just stops. If the cause of that is the upstream device booting slowly for example then simply increasing the timeout there can get past it. But if there is some bigger delay it can be a problem.

Steve