sendto error: 65



  • Good news.



  • Well, the ONT was replaced. I'll see how it goes for the next however many days.



  • Well, I wish I could report that the new ONT didn't drop the internet, but it did. I have a call back into Verizon.



  • So I'm having a lot of issues with this even after the second call to my ISP. I'm just wondering, if the internet goes down, pfSense of course will log it and I wouldn't be able to connect, however, will pfSense sense (for lack of a better term) when the internet comes back up and allow connectivity? I'm just trying to narrow down if pfSense could be causing this; I don't believe so. Any ideas anyone?



  • Yes, it should reconnect when everything comes back up. There was an issue about DHCP client timeouts that prevented reconnection in some cases.



  • Luckily most of my equipment is IP based except for a couple of devices. I wrote a Python program to check internet connectivity and write any loss of internet connectivity to a log file so I can show my ISP, not that they really care to see the log file.

    When this issue occurs, the pfSense dashboard shows the gateway is offline. Oddly enough, it's only been occurring at night. I wonder why then I have to restart pfSense to reconnect to the internet? Maybe I'm stupid here but wouldn't it stand to reason that if the signal stops outside of my home somewhere, of course I wouldn't be able to get on the internet, and when the signal was restored I would be able to get back on without having to restart pfSense? It's as if the signal is trying to get through pfSense but "something" is keeping it from getting through. Hopefully that makes sense.



  • Sounds like that DHCP issue I mentioned earlier. I can't seem to find the post but it had to do with the Advanced config of your WAN interface, and playing with the Protocol timing options. Sorry I can't be more specific.



  • Thanks for that info. I think I found the post:
    https://forum.netgate.com/topic/96923/pfsense-not-recovering-from-wan-failure

    I found a few other posts from other sites as well. I'll change the Protocol timing Timeout and Retry times and see what happens.

    Interestingly, I wonder why it does it only at night or early morning hours?



  • So I changed the Protocol timing settings to -
    Timeout: 3600
    Retry: 3

    This didn't work. Any further ideas? Maybe some more favorable settings for the Protocol timing?



  • My last remaining suggestion is to do a packet capture (Diagnostics - Packet Capture) when you're having the problem to see what's really going on. That's how the other DHCP timeout error was discovered. Post the cap here and someone can look at it to help you figure out the real issue.



  • While pfSense was reporting the gateway as being offline, I ran a packet capture on the WAN. There were no packets captured. I may have to run a packet capture overnight since it seems to occur more during the early morning hours, after midnight.



  • There should have been something. You left it at the default of WAN, Any protocol, any address?



  • There was no captures at all. I made sure the settings were WAN, any protocol and any address.



  • I don't know if there is going to be a solid solution to this or not. However, seeing that the Protocol timing doesn't seem to be working, it would be nice if there was an installable System Package that would monitor the gateway and do something like release/renew automagically when it goes offline. Just a thought.

    Troubleshooting continues.



  • I did a Packet Capture on my WAN but for whatever reason pfSense will not allow me to download it; it's only 130 MB. After trying to download it, pfSense locks up and I have to restart it. If I knew the files path, I could SFTP into pfSense and download it.

    In the Status > System Logs > DHCP, I've found a lot of the following entries. My WAN is on igb7.

    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 21
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 18
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 11
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 7

    dhclient 92079 send_packet: Host is down

    I'm not sure of the relevance of those entries but it seemed to have something to do with what's going on. Maybe not?


  • Netgate Administrator

    The last packet capture is in /root/packetcapture.cap



  • stephenw10...thanks for the path. I was able to download it to my Ubuntu desktop.

    I've imported the cap file into Wireshark. Now, it's not that I haven't used Wireshark in the past, I mean way past, but in this case for the issue I'm trying to resolve, I'm not sure what I should be looking for.


  • Netgate Administrator

    If the pcap shows it is continually broadcasting dhcp requests then it looks like it's something at the other end.

    If it stops sending after a while you could be hitting the referenced dhclient bug that required the timeout increase. It will still timeout and fail to restart though if so.

    Steve



  • I wrote a Python program a while ago that goes out and checks for internet connectivity every so many seconds. The program writes to a log file when the internet is unreachable. In this particular instance, the internet became unreachable between 07:06am and 07:07am. I've changed the display times in Wireshark to be able to see the dates and times as we would normally see them.

    I'm not sure that I'm seeing any entries that says or describes dhcp. I'm looking in the time frame that internet connectivity was lost, just before and after.

    After a search for dhcp traffic (udp.port == 68), the only entries I'm seeing is:
    20633 2019-11-29 23:25:15.602566 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    52915 2019-11-30 00:25:15.623471 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    69178 2019-11-30 01:25:15.288596 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    84088 2019-11-30 02:25:15.350459 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    100715 2019-11-30 03:25:15.408722 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    116291 2019-11-30 04:25:15.462404 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    167247 2019-11-30 05:25:15.512665 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    183374 2019-11-30 06:25:15.560319 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f

    The times seem to correspond hourly to the Protocol timing Timeout I changed to 3600, 1 hour.


  • Netgate Administrator

    Well those TCP ACK packets look to be replies from some external site so I'd suggest the WAN was not down at that moment.
    I expect to see it start to send DHCP requests if the WAN went down, unless the link actually failed...



  • Sorry, I just changed my reply, up two posts. Those were the only DHCP entries I could find. Sorry for the confusion.



  • I just found this post which may be a temporary answer. I haven't read all of it yet.
    https://redmine.pfsense.org/issues/9267


    After reading, it looks a little too complicated for me to try ☺ . Looks like I'll have to wait for the next pfSense release for this to be fixed.


  • Netgate Administrator

    Yup, that's the bug we are referring to. The dhcp client fails to handle that error correctly and instead of simply retrying until it gets a reply it just stops. If the cause of that is the upstream device booting slowly for example then simply increasing the timeout there can get past it. But if there is some bigger delay it can be a problem.

    Steve



  • stephenw10...Thanks for that information. It's great to know the issue is being addressed. I just hope as Jim Pingle mentioned in the redmine link, it's not much longer now. I just can't wait for it to be resolved. Thanks for the assistance.



  • It just hit me to ask. Since this issue affects the user base of pfSense, have the pfSense developers thought of releasing a fix at least just for this issue before the next version of pfSense is released? This would at least alleviate a lot of restart frustration. Just wondering.



  • In System > Update, would anyone happen to know if the fix for the dhclient is in the "Latest 2.4.x development version (2.4.5)", Latest Base System 2.4.5.a.20101222.1312? I would be willing to install just to fix the issue.

    When I login to the terminal, would anyone know what the release/renew command is so I don't have to keep restarting?

    Thanks.


  • Netgate Administrator

    It doesn't look like it made 11 stable so not in 2.4.5 either:
    https://github.com/pfsense/FreeBSD-src/commits/RELENG_2_4_5/sbin/dhclient/dhclient.c

    Steve



  • So this is really affecting me. Any idea how to do what is mentioned in this post?
    https://redmine.pfsense.org/issues/9267


  • Netgate Administrator

    You would need to apply those patches, rebuild the binaries on an equivalent FreeBSD system and upload them to pfSense. A non-trivial task.



  • Any ideas on what I can do in the mean time?


Log in to reply