WAN alarm triggers complete loss of internal routing
-
Hi,
I am probably missing something real simple but I am puzzled why this event:
Sep 7 14:58:12 dpinger 77623 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 81.1.113.55 bind_addr 89.240.248.21 identifier "WAN_PPPOE "
results in a 2.6.0 pfSense from being unreachable on the LAN.....a cold boot seems the only cure.
I have since disabled Gateway monitoring - not sure if this will help?
Thank you for any assistance / insight offered.
-
That log entry is dpinger starting when the PPPoE WAN comes up.
That implies it went down.
By the most common cause of issues like this is that there are multiple gateways defined and the default IPv4 gateway is still set to automatic.
Go to System > Routing > Gateways and set the Default v4 gateway to WAN_PPPoE.
You probably have invalid gateway(s) configured on internal interfaces. Unless you are actually routing internally.
Steve
-
Thanks Steve.
Under System / Routing / Gateways there has only ever been one gateway, WAN_PPPOE. No static routes or gateway groups.
All the internal L3 routing is via the pfSense, 9x /24 interfaces defined. IPv4 upstream gateway is set to none on all of these interfaces.
When the WAN drops, none of the VLAN IPs are reachable from devices in those subnets. Reboot the pfSense and connectivity is restored.
I would expect the internal routing to work regardless of WAN connectivity.
-
@korgua said in WAN alarm triggers complete loss of internal routing:
I would expect the internal routing to work regardless of WAN connectivity.
Yeah that would/should be the case.. Your saying when the wan goes down you can not even ping the pfsense IP from a device in that segment..
And you can ping it before right - its not like you have some rules on the interface blocking ping? Do you have some policy routing setup? What does your rules look like.. because with wan monitor it is possible when you have rules setup with a gateway, and that gateway is down that those rules are not loaded.. But that shouldn't prevent local routing, since if you policy route out your wan, to get to other local networks you would need rules above that anyway, which shouldn't go away.
system / advanced / misc
Possible when the wan goes down, pfsense crashes completely?
-
Yup, not that common issue then. More digging required!
-
Thanks John.
Yes, ping is allowed as is web access to the FW from some trusted subnets. On reboot the ping and web access resumes as does all the L3 routing.
All the rules have * under "Gateway"
No policy routing is setup.
pfSense has been working faultlessly for a few months....two WAN wobbles and LAN routing stops.
It's a headless mini PC with a single NIC, that requires a cold boot to get working again.
-
@korgua said in WAN alarm triggers complete loss of internal routing:
It's a headless mini PC with a single NIC
Ah - so this is just single nic.. And your other networks are just vlans on this physical nic.. That throws a wrench into it.
Are you saying before - when your wan would go down, you were able to still route between your vlans?
-
We had one outage when the FTTP was first installed - put that down to 'other' events.
Since then, no WAN issues but two today and LAN routing stops...so, to answer your question, I think that from day 1, WAN outage would cause LAN routing to stop.
-
So hosts in each VLAN can still reach pfSense but not any other subnet?
-
No, pfSense will not communicate to any device....i.e. pinging the local gateway or web browsing to it, does not work. Apologies, I can see why my answers may have suggested this
I have done the same configuration / setup on VMs before and they do not exhibit this issue.
-
Hmm, so actually it loses connectivity on all interfaces?
What are those interfaces, how are they connected?
Nothing else is logged?
-
@stephenw10 said in WAN alarm triggers complete loss of internal routing:
What are those interfaces, how are they connected?
Looks like this is a 1 nic device..
It's a headless mini PC with a single NIC
-
Ah, that would do it!
Is it a Realtek NIC?....
-
It is
re0@pci0:2:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x15 hdr=0x00
vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
class = network
subclass = ethernetit has been performant and trouble free - until the WAN wobbles.
Other than swapping out the hardware is there something else to configure / check?
Addendum:
Id Refs Address Size Name
1 26 0xffffffff80200000 3aed878 kernel
2 1 0xffffffff83cee000 39adb0 zfs.ko
3 2 0xffffffff84089000 9860 opensolaris.ko
4 1 0xffffffff84093000 d01c0 if_re.ko
5 1 0xffffffff84321000 1000 cpuctl.ko
6 1 0xffffffff84322000 2150 acpi_wmi.ko
7 1 0xffffffff84325000 ab40 snd_uaudio.ko
8 1 0xffffffff84330000 8e10 aesni.koUnder System / Advanced / Networking
All ticked except ARP handling and Reset all states
-
Check the system logs for watchdog timeout messages from the re driver. If you see any try using the alternative driver. Are you running 2.7?
-
Looking at some notes I made we had watchdog timeouts when performing speedtests on the new FTTP.
We did the following
fetch -v https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/latest/All/realtek-re-kmod-196.04.txz
pkg install -f -y realtek-re-kmod-196.04.txzand added
if_re_load="YES"
if_re_name="/boot/modules/if_re.ko"to /boot/loader.conf.local
This resolved the issue with watchdog timeouts - not seen one since.
We are running 2.6
-
Well I'd recommend upgrading to 2.7 if you can. The re kmod driver is in our repo there and is at v1.98.
However, ultimately, you should use some other NIC.
-
Thanks Steve. We will replace this mini-PC with a VM on an Intel NIC platform.
In every other respect the mini-PC has been great but there is nothing to gain with experimenting on this hardware.