Packet Loss Restart Script
-
Hello. At times when I find I have no internet connectivity and I login to the pfSense Dashboard GUI, there is no WAN IP address associated with the WAN interface and the Gateways show one hundred percent packet loss. This is usually resolved by logging into the terminal and restarting pfSense. I'm not a coder by any means so I'm wondering if anyone knows if there is a cron script that can check the Gateways for one hundred percent packet loss every so often and then restart pfSense? Thank you.
-
What sort of WAN is it?
If it has no IP I expect it to be trying to reconnect anyway. If it isn't the logs should show why not.
dpinger already monitors the WAN connection and fires off some scripts when it throws a packet loss alarm. You could add something to those if it's required, which it shouldn't be.
Steve
-
@newuser2pfsense said in Packet Loss Restart Script:
Hello. At times when I find I have no internet connectivity and I login to the pfSense Dashboard GUI, there is no WAN IP address associated with the WAN interface and the Gateways show one hundred percent packet loss. This is usually resolved by logging into the terminal and restarting pfSense. I'm not a coder by any means so I'm wondering if anyone knows if there is a cron script that can check the Gateways for one hundred percent packet loss every so often and then restart pfSense? Thank you.
It's already a script like that that does exactly that.
That is, no need to restart the entire system. It takes down the (WAN) interface, but it back on again, and this event starts a connection renegotiation.The tool is called 'dpinger'.
Go to this page : System > Routing > Gateways and now click on the edit button of your listed gateway.
You wind up here : System > Routing > Gateways > Edit and there you can set up what needs to be done when an interface doesn't pass traffic any more (== too be more exact : the pings that are send don't come back ).Restarting pfSense is somewhat like the belgium people that go to bed in the evening with a brick. They throw it to the light bulb, as that's how they put out the light.
( they never understood the light switch )The reasons why the connection became bad has probably nothing to do with pfSense, its most probably an upstream issue.
-
Hey Steve...not sure what you mean by what kind of WAN is it. It's just the WAN on my home network. I expected dpinger to be monitoring the WAN. I checked the settings for dpinger and there's no restart option that I could find and I'm using 2.5.2. This happened a few days ago and I haven't checked the logs. Typically the issue is upstream though. I would just like the restart option when there's a 100% packet loss that way when I'm away from home, I don't have to worry about not being able to VPN to pfSense because it hasn't resolved itself. It always seems that when I restart pfSense, it resolves the issue.
I checked the dpinger advanced settings and didn't see a restart option. I agree about the restart, but hey, it works.
-
I mean how is the WAN configured? Since it has no IP it's unlikely to be static so DHCP? PPPoE?
Check the DHCP logs it that's what it is. Check the system logs when it disconnects.
Steve
-
@stephenw10
The WAN is configured as DHCP. It just did it again 2 times right before I posted here and I restarted 2 times, so at the time it occurred -
The DHCP logs are inconclusive.
The Gateways logs just show:
dpinger 18267 WAN_DHCP <IP Address>: Alarm latency 8529us stddev 13328us loss 21%
System General shows:
rc.gateway_alarm 44530 >>> Gateway alarm: WAN_DHCP (Addr:<IP Address> Alarm:1 RTT:8.529ms RTTsd:13.328ms Loss:21%)Ok, this is becoming untenable. I didn't want this to become a hunt for what's causing it.
-
Well it would be much better to find the cause of issue and fix it than patching it to reboot every time there's packet loss.
-
@stephenw10
I was just looking for a quick fix until I could figure it out - that quick fix being to restart pfSense. I was hoping there would be some kind of cron script that could do just that; check every so often and if there is 100% packet loss, restart. I'm just not a coder to develop the script. -
@stephenw10
Prior to version 2.5.0, pfSense had a Redmine ticket, Bug #9267, https://redmine.pfsense.org/issues/9267. This bug was fixed in version 2.5.0 and life was grand. Now version 2.5.2 is exhibiting the same issue previous to version 2.5.0. It's kind of hard to VPN to pfSense when the WAN is at 100% packet loss. If this is a pfSense issue, someone please fix it. -
@newuser2pfsense said in Packet Loss Restart Script:
https://redmine.pfsense.org/issues/9267
Ok, yeah, as I wrote on there this was fixed in 2.4.5 and there isn't a known regression there.
However it's easy to see if you're hitting that because the dhcp client stops trying to get a lease. Do you see it continually trying to get a new lease in the logs? That's what you should see.If there really has been some regression there it should be obvious and we can probably patch it quite easily.
Steve
-
@stephenw10
I just checked the logs; System > General and System > Gateways. There's absolutely nothing trying to get a new lease anywhere. However, I am finding an effton, and I mean an effton, of these messages in the Gateways logs that has to do with the dpinger:dpinger 74673 WAN_DHCP <IP Address>: sendto error: 65
In the graph I provided above, when the Gateway came back online, that was due to a forced restart. The Gateway had 100% packet loss for hours.
I'm hoping someone can take a look at it and patch it really soon! I need to be able to VPN.
-
@newuser2pfsense said in Packet Loss Restart Script:
https://redmine.pfsense.org/issues/9267
Ok, that's expected since it doesn't actually have that IP. What's the last thing shown there before that though? Is it incorrectly binding a previous lease IP as shown in the bug report?
Steve
-
@stephenw10
The only thing it shows right before the 100% packet loss is -System > General:
rc.gateway_alarm 26512 >>> Gateway alarm: WAN_DHCP (Addr:<IP Address> Alarm:1 RTT:9.898ms RTTsd:19.454ms Loss:22%)Unfortunately, my System > Gateways logs don't go back far enough for me to tell.
-
@stephenw10
It's at it again. This is what I saw when I got up this morning. I had to login to the terminal to restart, yet again, before I could even post this message. This is awful...In the System Logs > DHCP, I am getting a repeating large number of the following block of data:
dhclient 73024 No DHCPOFFERS received.
dhclient 73024 No working leases in persistent database - sleeping.
dhclient 68432 FAIL
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 2
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 3
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 3
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 7
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 11
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 21
dhclient 73024 DHCPDISCOVER on igb7 to 255.255.255.255 port 67 interval 14 -
Ah, OK. Those DHCP logs show it's not the same bug then. With that bug present it would simply stop trying to get a new lease.
Ok, so the dhcp logs show it is trying to pull a new lease and nothing is responding.
Is igb7 the correct interface?
If you run a pcap on igb7 at the time can you see it actually sending those dhcpdiscover packets?
If you disconnect the cable from igb7 and reconnect it does that bring it back?You could just add a reboot command to /etc/rc.gateway_alarm but I wouldn't recommend it.
That could be triggered by other things at other times with unexpected consequences.
If you can bring the WAN back by disconnecting it or reapplying it's settings you could at least set that instead which would be a lot less disruptive.Steve
-
@stephenw10
Yes, igb7 is the WAN interface.
I really didn't want to run a pcap on igb7 and go down this rabbit hole right now. Unfortunately, I don't have the time to devote to troubleshooting this issue. That's why I was looking for a quick fix. As well, when you are away and trying to VPN in this condition, you can't do anything at all. I do have a CRON setup to restart pfSense at certain time intervals and I could VPN in right after that restart but those times are not always convenient. That's why I was hoping for a CRON script that would check the WAN interface for packet loss every so often, say every 5 minutes, and if there is 100% packet loss, restart. Right now, restarting seems to fix it for a small amount of time. I really don't want to have to go back to IPFire. I really like pfSense! -
Well you can use /etc/rc.gateway_alarm as I said. That is triggered by dpinger whenever a gateway goes outside the configured monitor settings. No need to run is a cronjob.
Steve
-
@stephenw10
I just looked at the contents of the /etc/rc.gateway_alarm file and wouldn't even know where to begin as I'm not a software programmer/developer. -
From your other thread, a year or two ago, I understood that you have an 'ONT'.
Is this a 'fibre' to 'fibre', so just a passive fibre cable interconnection box, or a fibre to Ethernet, thus powered using it's own power brick ?
I'm testing myself a powered ONT, a 'small box', put in place by ISP recently.
Especially what happens when the power fed into the ONT is not 'stable'. How does it reset ? To make a long story short : my opinion is that this box should be powered, as pfSense, by an using an UPS.This is me me thinking : what if your connection, the fibre to the ISP, IS ok, but the ONT goes brain dead after a power issue, and only reconnects correctly when the Ethernet goes from UP to DOWN to UP the state.
That would explain that rebooting pfSense rebuilds a good connection.
If so, instead of rebooting pfSEnse you could make your connection work by removing the pfSense WAN cable for a short moment.
Or remove the power from the ONT, and put it back on again.Can you confirm all this - did I described your installation correctly ?
-
@gertjan
The ONT is fibre to Ethernet. It has its own power supply that's mounted to the wall inside the home. The ONT power supply has been connected to a UPS since I moved in for clean, continuous power, or at least until the battery runs out if there is a long power outage.I still think that creating a file with the code to check the WAN gateway for packet loss is the way to go for those intermittent times. Create the file with the code, park it in the file system somewhere and then create a CRON job that repeats every 5 minutes or so pointing to the file to run the code. If there is packet loss, restart, if not, exit. I'm just not a coder.
It just happened again. When I unplugged the WAN ethernet cable and plugged it back in, the pfSense Dashboard GUI showed no gateways and it stayed that way. I had to restart again.
Interestingly, instead of restarting the pfSense box, I pressed the on/off switch for a hard shutdown. I then pushed the on button and started pfSense. I've done this a couple of times and the WAN stayed up for a little while. I don't know if this has any relevance or not.