WAN gateway stops working after packet loss

HyperFlight

Hey all,

Finally got round to setting up pfsense, and I'm enjoying learning the ropes. However, since I set it up, I have issues with the WAN every night. As a caveat: My ISP is TalkTalk and they've been terrible. I had an email a couple of weeks ago informing me that they'd be performing upgrades over several nights, and shortly afterwards my trusty old Asus DSL-N55U started having a melt down and would require daily reboots. The original TalkTalk modem router is stable as a rock once I plug it back in. This leads me to believe that something has changed, but I've no idea what. All of this prompted me to get my new setup done, so I've got pfsense up and running on a nice little Jetway box, and bought a DrayTek Vigor 130 to take over modem duties.

The gateway logs show dpinger picking up packet loss at the relevant time and alarming. Post these events, whilst the gateway shows as up, the internet simply doesn't work. Restarting the gateway brings it back straight away. The thing I'm unsure of in these logs is how the dest_addr is changing every time? I understand them giving me a different IP each time as I'm not on a static IP.

Any help would be appreciated!

dpinger_log.PNG_thumb

kpa

Disable the gateway monitor for WAN if you're not going to use it for anything, this will stop the gateway from being marked as offline when there is too much packet loss.

HyperFlight

@kpa:

Disable the gateway monitor for WAN if you're not going to use it for anything, this will stop the gateway from being marked as offline when there is too much packet loss.

Thanks @kpa, I'll try that tonight. It feels that I'm just ignoring the root cause though. General packet loss would be one thing, but this daily drop out between certain hours has me perplexed!

Is that the expected behaviour of the gateway monitor in this case? Specifically that it raises the alarm and marks the gateway as offline? (Even though the WAN still shows as online).

kpa

The monitoring is what makes failover/load balancing possible in a multi-WAN system:

https://doc.pfsense.org/index.php/Multi-WAN

HyperFlight

@kpa:

The monitoring is what makes failover/load balancing possible in a multi-WAN system:

https://doc.pfsense.org/index.php/Multi-WAN

Ah ok. Thanks for the link!

My preference would be to leave the monitor enabled so that I can fire up graphs on packet loss etc when I have the inevitable usual connectivity problems on my line and I want to shout at TalkTalk some more!

From a bit of searching, and given I'm on a single WAN setup, I think I'll try toggling this option: State Killing on Gateway Failure.

If that doesn't work then I'll disable the monitor. I'm starting to think this is not actually 'packet loss', but simply that the gateway doesn't recover once the ISP dolls out a new IP. I'm a bit surprised at this? Would we not expect the gateway to recover in that situation and adapt to the new WAN IP? I'm hoping that by flushing the states once the gateway goes down, it'll pick up the new IP.

pwood999

Change the GW Monitor to use an external IP (OpenDNS, etc) rather than the WAN address.

This way when TalkTalk changes your dynamic public IP, the Modem should ARP the change to Pfsense, and the next time dpinger runs it should see the new route.

HyperFlight

@pwood999:

Change the GW Monitor to use an external IP (OpenDNS, etc) rather than the WAN address.

This way when TalkTalk changes your dynamic public IP, the Modem should ARP the change to Pfsense, and the next time dpinger runs it should see the new route.

That makes a lot of sense, will give it a go!

HyperFlight

Just as an update (and a couple more questions):

Changing the monitor IP certainly helped things. I achieved a solid period of > 36 hours of uptime, which I'd not managed before. The previous issue of the gateway monitoring the old gateway IP after it had been renewed by the ISP doesn't appear to be a factor any more.

However, I've still had a couple of 'WAN wobbles' in the last 24 hours. The first one didn't bring the gateway down, but the second one did. Both are marked in the gateway logs by dpinger alarms. I have a couple of questions on the back of that:

In terms of logging and behaviour, at what point is it that the gateway marks itself as down? That's not indicated in the log explicitly, so is it the alarm condition that does it?
Why is it that the gateway never tries to recover? I wonder if I've misinterpreted how they work. I appreciate that they handle fail over in multi-wan setups etc, but I'm confused as to why a single gateway doesn't seem to have the option to restart itself at certain intervals?

Apologies if this is just me being a n00b :)

For the moment, the next step I've taken is to tick 'Disable Gateway Monitoring Action' next to 'Gateway Action' in the settings. This feels like it achieves the goal of still having the monitoring, but won't nuke the gateway if it alarms.

Thanks in advance for any additional insight you can give.

silverberg

Hi, did your find a solution for this? I've been having this issue for about a month but can't find a solution. I've even changed hardware

HyperFlight

@silverberg:

Hi, did your find a solution for this? I've been having this issue for about a month but can't find a solution. I've even changed hardware

What have you tried and exactly what behaviour are you experiencing? As per my last post, the combination of changing monitoring IP and the Gateway Action have sorted it for me.