sendto error: 65



  • I'm currently running pfSense 2.4.4-RELEASE-p3 (amd64), FreeBSD 11.2-RELEASE-p10.

    I've been noticing that my interwebs become unreachable at times and I end up having to do a manual restart of pfSense. I really didn't start looking for the reason as it was more or less intermittent. Now it's almost daily. I've noticed more than anything that my WAN gateway is offline with an effton of the following types of entries in the Status > System Logs > System > Gateways:

    dpinger WAN_DHCP <IP Address>: sendto error: 65

    I setup a CRON job to run twice each day to restart pfSense in an effort to alleviate the issue which still doesn't seem to help. I can't say for sure what's causing it but it does seem that my WAN IP address has been changing more frequently for whatever reason, not that this is what's causing the gateway to go offline.

    Anyone have any suggestions as to what may be occurring and how to fix it?



  • Gateway monitoring can cause a gateway to be marked as down if it detects high packet loss or high ping. What do you see under Status - Monitoring - Quality (Left axis)? You could try disabling Gateway Monitoring via System - Routing - Edit gateway - Disable Gateway Monitoring and see what happens. Anything in your System log around the time that this happens?



  • I'm sorry I haven't been able to get back sooner with a response but I've really been digging in trying to find what's causing the issue. I deleted the two CRON job restarts that I created. I disabled the Gateway Monitoring as you described and restarted and then tried a couple of web pages with no response. I then enabled the Gateway Monitoring and was able to surf around again. Under Status > Monitoring, I'm at the Default and all I see is an Interactive Graph and a Data Summary - I don't see a Quality. I did find the below line in the System Logs but nothing around it that sticks out as to what's causing the issue.

    Nov 10 22:23:01 rc.gateway_alarm 55917 >>> Gateway alarm: WAN_DHCP (Addr:<ip address> Alarm:1 RTT:15.276ms RTTsd:37.242ms Loss:21%)

    Very frustrating to say the least.



  • @newUser2pfSense said in sendto error: 65:

    I'm at the Default and all I see is an Interactive Graph and a Data Summary - I don't see a Quality.

    Click on the wrench icon at the top-right for options, then change the Left Axis from System to Quality.



  • Ok...I have the quality now. I'm going to post a screenshot.

    Quality.png



  • That 40% packet loss seems to be a bit of a problem.



  • Indeed it is! I was actually using the interwebs when that happened and I immediately opened up pfSense and began looking at the Status > System Logs. That's when I came along the rc.gateway_alarm. It sure would be nice to find what's making the gateway drop off like that. I don't know that my service provider, Verizon, could be doing something to cause the issue. I notice it typically when I get up in the mornings when I go to check for emails and I can't get out. I then have to restart pfSense. Frustrating.



  • So I just called Verizon's tech support, and after the third person I spoke with, 2 techs found that the ONT was throwing alarms and disconnecting as shown from the ONT history. I now have an appointment for a tech to come out to look at the ONT.



  • Good news.



  • Well, the ONT was replaced. I'll see how it goes for the next however many days.



  • Well, I wish I could report that the new ONT didn't drop the internet, but it did. I have a call back into Verizon.



  • So I'm having a lot of issues with this even after the second call to my ISP. I'm just wondering, if the internet goes down, pfSense of course will log it and I wouldn't be able to connect, however, will pfSense sense (for lack of a better term) when the internet comes back up and allow connectivity? I'm just trying to narrow down if pfSense could be causing this; I don't believe so. Any ideas anyone?



  • Yes, it should reconnect when everything comes back up. There was an issue about DHCP client timeouts that prevented reconnection in some cases.



  • Luckily most of my equipment is IP based except for a couple of devices. I wrote a Python program to check internet connectivity and write any loss of internet connectivity to a log file so I can show my ISP, not that they really care to see the log file.

    When this issue occurs, the pfSense dashboard shows the gateway is offline. Oddly enough, it's only been occurring at night. I wonder why then I have to restart pfSense to reconnect to the internet? Maybe I'm stupid here but wouldn't it stand to reason that if the signal stops outside of my home somewhere, of course I wouldn't be able to get on the internet, and when the signal was restored I would be able to get back on without having to restart pfSense? It's as if the signal is trying to get through pfSense but "something" is keeping it from getting through. Hopefully that makes sense.



  • Sounds like that DHCP issue I mentioned earlier. I can't seem to find the post but it had to do with the Advanced config of your WAN interface, and playing with the Protocol timing options. Sorry I can't be more specific.



  • Thanks for that info. I think I found the post:
    https://forum.netgate.com/topic/96923/pfsense-not-recovering-from-wan-failure

    I found a few other posts from other sites as well. I'll change the Protocol timing Timeout and Retry times and see what happens.

    Interestingly, I wonder why it does it only at night or early morning hours?



  • So I changed the Protocol timing settings to -
    Timeout: 3600
    Retry: 3

    This didn't work. Any further ideas? Maybe some more favorable settings for the Protocol timing?



  • My last remaining suggestion is to do a packet capture (Diagnostics - Packet Capture) when you're having the problem to see what's really going on. That's how the other DHCP timeout error was discovered. Post the cap here and someone can look at it to help you figure out the real issue.



  • While pfSense was reporting the gateway as being offline, I ran a packet capture on the WAN. There were no packets captured. I may have to run a packet capture overnight since it seems to occur more during the early morning hours, after midnight.



  • There should have been something. You left it at the default of WAN, Any protocol, any address?



  • There was no captures at all. I made sure the settings were WAN, any protocol and any address.



  • I don't know if there is going to be a solid solution to this or not. However, seeing that the Protocol timing doesn't seem to be working, it would be nice if there was an installable System Package that would monitor the gateway and do something like release/renew automagically when it goes offline. Just a thought.

    Troubleshooting continues.



  • I did a Packet Capture on my WAN but for whatever reason pfSense will not allow me to download it; it's only 130 MB. After trying to download it, pfSense locks up and I have to restart it. If I knew the files path, I could SFTP into pfSense and download it.

    In the Status > System Logs > DHCP, I've found a lot of the following entries. My WAN is on igb7.

    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 21
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 18
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 11
    dhclient 94735 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 7

    dhclient 92079 send_packet: Host is down

    I'm not sure of the relevance of those entries but it seemed to have something to do with what's going on. Maybe not?


  • Netgate Administrator

    The last packet capture is in /root/packetcapture.cap



  • stephenw10...thanks for the path. I was able to download it to my Ubuntu desktop.

    I've imported the cap file into Wireshark. Now, it's not that I haven't used Wireshark in the past, I mean way past, but in this case for the issue I'm trying to resolve, I'm not sure what I should be looking for.


  • Netgate Administrator

    If the pcap shows it is continually broadcasting dhcp requests then it looks like it's something at the other end.

    If it stops sending after a while you could be hitting the referenced dhclient bug that required the timeout increase. It will still timeout and fail to restart though if so.

    Steve



  • I wrote a Python program a while ago that goes out and checks for internet connectivity every so many seconds. The program writes to a log file when the internet is unreachable. In this particular instance, the internet became unreachable between 07:06am and 07:07am. I've changed the display times in Wireshark to be able to see the dates and times as we would normally see them.

    I'm not sure that I'm seeing any entries that says or describes dhcp. I'm looking in the time frame that internet connectivity was lost, just before and after.

    After a search for dhcp traffic (udp.port == 68), the only entries I'm seeing is:
    20633 2019-11-29 23:25:15.602566 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    52915 2019-11-30 00:25:15.623471 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    69178 2019-11-30 01:25:15.288596 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    84088 2019-11-30 02:25:15.350459 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    100715 2019-11-30 03:25:15.408722 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    116291 2019-11-30 04:25:15.462404 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    167247 2019-11-30 05:25:15.512665 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f
    183374 2019-11-30 06:25:15.560319 138.88.12.1 138.88.12.188 DHCP 334 DHCP ACK - Transaction ID 0xa348e56f

    The times seem to correspond hourly to the Protocol timing Timeout I changed to 3600, 1 hour.


  • Netgate Administrator

    Well those TCP ACK packets look to be replies from some external site so I'd suggest the WAN was not down at that moment.
    I expect to see it start to send DHCP requests if the WAN went down, unless the link actually failed...



  • Sorry, I just changed my reply, up two posts. Those were the only DHCP entries I could find. Sorry for the confusion.



  • I just found this post which may be a temporary answer. I haven't read all of it yet.
    https://redmine.pfsense.org/issues/9267


    After reading, it looks a little too complicated for me to try ☺ . Looks like I'll have to wait for the next pfSense release for this to be fixed.


  • Netgate Administrator

    Yup, that's the bug we are referring to. The dhcp client fails to handle that error correctly and instead of simply retrying until it gets a reply it just stops. If the cause of that is the upstream device booting slowly for example then simply increasing the timeout there can get past it. But if there is some bigger delay it can be a problem.

    Steve



  • stephenw10...Thanks for that information. It's great to know the issue is being addressed. I just hope as Jim Pingle mentioned in the redmine link, it's not much longer now. I just can't wait for it to be resolved. Thanks for the assistance.



  • It just hit me to ask. Since this issue affects the user base of pfSense, have the pfSense developers thought of releasing a fix at least just for this issue before the next version of pfSense is released? This would at least alleviate a lot of restart frustration. Just wondering.


Log in to reply