is there a Supported way to have DPinger restart the wan interface on failure.
-
Forgot to list the Version: 23.05.1-RELEASE (amd64)
-
just experienced this again. arp table has entries for the wanip but not for the gateway. dhcp seems to retry many times prior to the actual event. opening the wan configuration screen and saving/applying with no changes brings everything back. really want/need some help creating a firewall that can recover from this. i will provide whatever information you request if possible. help please
-
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:
I am not getting a lease from modem. I reject leases from 192.168.100.1
That's not a Starlink ?
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:
sendto error 64
64 EHOSTDOWN Host is down. A socket operation failed because the desti- nation host was down.
Check the System log ; the WAN NIC didn't fire a DOWN (and UP) event ? This is what happens when the cable is removed, or plain bad, or the other side pulled the connection DOWN for a moment.
The WAN drivers knows there is 'nobody' (no host) at the other side of the link, so it signals a '64'.
A test : between your upstream WAN device and pfSense : place a switch.
-
Gertjan, thanks a lot for the reply.
Arris sb8500, cable modem connected to comcast.
had a switch between pfsense and the modem for a week. made no difference to the behavior.
No wan up or down events that i can find..This is not a great example as i was sitting in front of the machine trying to connect to work when the event occurred. so i almost immediately reset the interface to get back online.
18:17:31 dpinger: sends an alarm stating wan_dhcp has packet loss of 31%
18:17:32 php-fpm: one or more tunnel endpoints may have changed ip addresses. reloading enpoints that may use wan_dhcp18:17:33 I get a bunch of filterdns messages about resolving alias to tables
Then it's absolute silence until 18:35:43 when the "wan_dhcp x.x.x.x : sendto error: 64" events start showing up from dpinger.
18:36:02 i get loss from opt1_vpnv4 from a gateway alarm. it restarts the openvpn tunnel.
still getting "wan_dhcp x.x.x.x : sendto error: 64" events repeating from dpinger
18:36:26 i get a single filterdns message saying it "failed to resolve host xxx will try later again." yes it says "later again" ;)
still getting "wan_dhcp x.x.x.x : sendto error: 64" events repeating from dpinger every second
18:38:38 openvpn tries to restart
18:38:58 openvpn tries to restart
18:39:08 openvpn tries to restart
18:39:18 openvpn tries to restart
18:39:28 openvpn tries to restart
.....open vpn continues this behavior over and over. meanwhile still getting "wan_dhcp x.x.x.x : sendto error: 64" events repeating from dpinger every second
i then open of the WAN configuration screen and click save/apply:
18:41:24 I get a flood of different messages:
18:41:24 dhclient : connection closed
18:41:24 dhclient : exiting,some filter dns failed to resolve host messages "will retry later again" messages
followed by filter dns events for resolving alias to tables.then these messages start repeating:
18:41:25 kernel : arpresolve: can't alocate llinfo for x.x.x.x on igc0while that is happening dpinger exits a couple of times:
18:41:25 dpinger : exit on signal 15then it happens
18:41:25 dhclient : PREINIT
dhclient : broadcast request
dhclient : dhcpack from server
i get new route and interface assignments etc...
then i get bound the ip with renewal of 11113 secondsfollowed by a completely working firewall again.
-
After this initial event :
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:18:17:31
What happens on the other channel .... DHCP Logs ?
filterdns messages : that's normal, they complain a there is no working WAN connection so "DNS" is out of order.
Same thing for the OpenVPN client : it uses the WAN to create its tunnel. That's also a no go while "WAN" is down. It tries to restart without being able to use the WAN interface.For example these
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:
18:41:24 dhclient : connection closed
18:41:24 dhclient : exiting,dhclient, the DHCPv4 client process quiets, because the interface WAN went down ( ? ) like electrically (physically) disconnected.
Afaik : it gets started as soon as the WAN comes up.The thing is : what is doing all this ?
You said :
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:
The solution:
Simply open the (interfaces / wan) configuration screen, make no changes and click save.
Everything immediately returns to a functional state.My turn :
My ISP router is up and running - it's a device using a fibre uplink.
As soon as I do something like this :
open the (interfaces / wan) configuration screen, make no changes and click save
(identical what you said)
my WAN connection goes into a permanent UP down UP down sequence.I've already stopped using the DHCPv4 WAN client, I use now a static IPv4 setup. That was no joy.
Soon, I'm going to use a static IPv6 WAN setup (and assuming the prefix I used doesn't change - dono if this is even possible).
Then : I'm going to ditch all packages that are 'interface' related, one by one.
I'm not using the OpenVPN client, but I do have the OpenVPN server for remote admining.This isn't a big issue for me, as my ISP router stays up and connected.
pfSense, once started up, is rock solid.I have a Netgate 4100, using ix3 as my 1Gbit WAN.
I have the impression that we chase the same bug.
Some race condition during "WAN reconstruction".edit : I don't have the "sendto 64" error messages.
-
dhcpv6 disabled on fw and all internal devices. have no real need for it currently.
dhcp logs last event
09:46:39 dhclient : bound to x.x.x.x --renewal in 43200 seconds
18:41:24 dhclient : connection closedat 18:41:24 is when i initiated an action on the console by opening the wan configuration screen and clicking save then apply.
at 18:41:26 i have a completely working firewall that is stable again until the next event.
The next event could be in a few hours or a day or two later.
-
The lease time later : 43200 seconds or 12 hours.
If the event happens before, then dhclient is innocent. -
Should dhcp client not have tried to renew half way through the lease at 6 hours? not a network engineer but that is my understanding.
I also get frequent long sections in my dhclient log where there are repeated dhcprequest . which is why i suspected dhcp as possibly being involved.
hardware configuration:
intel: NUC13ANHI5 - 64GB RAM - 1TB drive - additional i226-V NIC expansion. -
@mikek said in is there a Supported way to have DPinger restart the wan interface on failure.:
renew half way through the lease at 6 hours?
I vote for 43200 as is sais "renewal in 43200".
If the "lease period" is 43200 then dhcp client will renew half way, true. -
Yes, you are right. It appears the lease expiration is:
option dhcp-lease-time 86400;
option dhcp-renewal-time 43200;
option dhcp-rebinding-time 75600;Which means I should see a renew attempt in the logs at 12 hours. but it never made it that long. Yet another bunny trail.
All of this and we are right where we started. With an issues we can't fully identify yet. and no way to help the firewall recover. Even though we know the steps needed for recovery. Any idea how to force dpinger to initiate self recovery on failure in a supportable way? At least then the failures would be short until the issue/resolution could be discovered.
Guess I could cron a reboot or interface down/up every X hours, but that does seem a bit excessive, intrusive and may even be worse than the original failures.
Be nice if there was a "re-initialize all" option on failure instead of just state options for interface failure actions.There are a couple of script I have found on the internet that can reboot only on ping failure, but in the years I have used PFSense, never needed them before. Don't really want to start down the road of adding unsupported scripts. Seems like a recipe for future troubles.