How To Disable/Enable Energy Efficient Ethernet (EEE)?
-
@uplink PingDoctor is an app that runs continuously, runs traceroute, and graphs the results. This allows me to visualize the point and duration of the outage.
Unfortunately, the outages don't seem to have a pattern, although they seem to happen more often during VoIP calls (Whatsapp/Zoom/Slack/etc.).
Here are the logs from the time of an outage:
Mar 11 06:50:14 rc.gateway_alarm 49054 >>> Gateway alarm: VPN_NAME_REMOVED-OPENVPN_VPNV4 (Addr:100.78.0.1 Alarm:1 RTT:5.134ms RTTsd:.201ms Loss:22%) Mar 11 06:50:14 check_reload_status 439 updating dyndns VPN_NAME_REMOVED-OPENVPN_VPNV4 Mar 11 06:50:14 check_reload_status 439 Restarting IPsec tunnels Mar 11 06:50:14 check_reload_status 439 Restarting OpenVPN tunnels/interfaces Mar 11 06:50:14 check_reload_status 439 Reloading filter Mar 11 06:50:14 rc.gateway_alarm 52480 >>> Gateway alarm: FIBER_PPPOE (Addr:FIBER_IP_ADDR_REMOVED Alarm:1 RTT:3.991ms RTTsd:4.251ms Loss:22%) Mar 11 06:50:14 check_reload_status 439 updating dyndns FIBER_PPPOE Mar 11 06:50:14 check_reload_status 439 Restarting IPsec tunnels Mar 11 06:50:14 check_reload_status 439 Restarting OpenVPN tunnels/interfaces Mar 11 06:50:14 check_reload_status 439 Reloading filter Mar 11 06:50:15 php-fpm 78891 /rc.openvpn: MONITOR: FIBER_PPPOE has packet loss, omitting from routing group Failover Mar 11 06:50:15 php-fpm 78891 FIBER_IP_ADDR_REMOVED|FIBER_GW_ADDR_REMOVED|FIBER_PPPOE|3.903ms|4.181ms|23%|down|highloss Mar 11 06:50:15 php-fpm 78891 /rc.openvpn: Gateway, switch to: ADSL_PPPOE Mar 11 06:50:15 php-fpm 78891 /rc.openvpn: Default gateway setting Interface ADSL_PPPOE Gateway as default. Mar 11 06:50:15 php-fpm 14462 /rc.openvpn: Gateway, switch to: ADSL_PPPOE Mar 11 06:50:15 php-fpm 14462 /rc.openvpn: Default gateway setting Interface ADSL_PPPOE Gateway as default. Mar 11 06:50:15 php-fpm 62663 /rc.filter_configure_sync: Gateway, switch to: ADSL_PPPOE Mar 11 06:50:15 php-fpm 78891 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Mar 11 06:50:15 php-fpm 78891 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPN_NAME_REMOVED-OPENVPN_VPNV4. Mar 11 06:50:15 php-fpm 14462 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Mar 11 06:50:15 php-fpm 14462 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use FIBER_PPPOE. Mar 11 06:50:15 php-fpm 14462 /rc.openvpn: OpenVPN: Resync client1 VPN_NAME_REMOVED- Mar 11 06:50:15 php-fpm 14462 OpenVPN terminate old pid: 76667 Mar 11 06:50:15 kernel ovpnc1: link state changed to DOWN Mar 11 06:50:15 check_reload_status 439 Reloading filter Mar 11 06:50:16 php-fpm 62663 /rc.filter_configure_sync: GW States: Gateway is down but its IP address cannot be located. Skipping state kill.: VPN_NAME_REMOVED-OPENVPN_VPNV4 Mar 11 06:50:16 php-fpm 14462 OpenVPN PID written: 8169 Mar 11 06:50:17 kernel ovpnc1: link state changed to UP Mar 11 06:50:17 check_reload_status 439 rc.newwanip starting ovpnc1 Mar 11 06:50:18 php-fpm 30069 /rc.newwanip: rc.newwanip: Info: starting on ovpnc1. Mar 11 06:50:18 php-fpm 30069 /rc.newwanip: rc.newwanip: on (IP address: 100.78.0.61) (interface: VPN_NAME_REMOVED-OPENVPN[opt2]) (real interface: ovpnc1). Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: MONITOR: FIBER_PPPOE is available now, adding to routing group Failover Mar 11 06:50:20 php-fpm 30069 FIBER_IP_ADDR_REMOVED|FIBER_GW_ADDR_REMOVED|FIBER_PPPOE|5.186ms|2.301ms|0.0%|online|none Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: Gateway, switch to: FIBER_PPPOE Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: Default gateway setting Interface FIBER_PPPOE Gateway as default. Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: IP Address has changed, killing states on former IP Address 100.78.7.90. Mar 11 06:50:20 php-fpm 30069 /rc.newwanip: Creating rrd update script Mar 11 06:50:22 php-fpm 30069 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 100.78.7.90 -> 100.78.0.61 - Restarting packages. Mar 11 06:50:22 check_reload_status 439 Starting packages Mar 11 06:50:22 check_reload_status 439 Reloading filter zzz
-
That looks like it just starts dropping packets on the fibre WAN and switches to the DSL WAN. That is the expected behaviour in that situation.
The first thing I would do there is change the monitor IP on the Fiber WAN to something external. Using the default WAN gateway IP can produce bad data. The gateway may not respond to pings when loaded. Or at all for that matter, though here it clearly does most of the time.
-
@stephenw10 I had it on something external for a long time. Will switch it back and try again. Also, I disabled monitoring totally for a bit - also didn't help.
-
With monitoring disabled then did the PPPoE session fail entirely and then reconnect? That isn't what's shown in the above log.
-
@stephenw10 No - I currently do have a monitor in place. I was just commenting that in the past I had tried removing the monitor and I still experienced the same issue
-
Ok well we'd need to see what is logged in that situation then since without monitoring it would not throw a gateway alarm and that's what's causing the issues you're seeing now. Assuming there is no loss of link logged before that which is omitted from the logs you posted.
-
@stephenw10 there is a small sinppet here, happy to share anything else that can help
-
Those logs show the first thing logged is an alarm from the gateway monitoring. So clearly at that time it was running the gateway monitor.
From what we can see there the NIC did not lose link. The WAN just started dropping packets. Unless there were entries before that showing it did lose link?
So it could simply be that the WAN is lossy under load and the gateway monitoring values should be adjusted to match that so alarms are not triggered. Or just disabled.
-
@stephenw10 said in How To Disable/Enable Energy Efficient Ethernet (EEE)?:
the WAN is lossy under load
a. I didn't notice any load (CPU or mem)
b. what would cause both WAN's to drop at exactly the same time?!@stephenw10 said in How To Disable/Enable Energy Efficient Ethernet (EEE)?:
gateway monitoring values should be adjusted
I'm happy to try - what would you suggest?
-
The WANs share the same NIC? Using VLANs? Same ISP? Something upstream maybe?
Try setting packet loss at 50%. Anything hitting that is almost certainly broken.
-
@stephenw10 said in How To Disable/Enable Energy Efficient Ethernet (EEE)?:
Try setting packet loss at 50%. Anything hitting that is almost certainly broken.
I did, same intermittent dropouts
-
OK and still no link losses logged? Just gateway alarms?
What about the interface stats in Status > Interfaces? Does it show errors, dropped packets? How about in the output of
netstat -i
directly? -
MTU 1492 In/out packets 27805509/14754894 (27.99 GiB/6.21 GiB) In/out packets (pass) 27805509/14754894 (27.99 GiB/6.21 GiB) In/out packets (block) 5606/0 (222 KiB/0 B) In/out errors 0/0 Collisions 0
netstat -i
Ierrs
/Idrop
/Oerrs
are all0
or-
-
On the parent NIC? Both?
-
@stephenw10 here is a more detailed look:
-
No errors or drops on anything then. Which NICs are the PPPoE WANs using?
Could just be something upstream like a bad modem.
-
@stephenw10 icg1 & icg2. Yeah, it could be what are the chances they are both problematic and both drop at the same exact time?
-
Same ISP?
-
@stephenw10 different ISP, different modem make, and different tech (adsl vs fiber)
-
Hmm, very unlikely then. So neither NIC ever loses link. The PPPoE sessions do not get disconnected. The PPPoE connections simply stop passing traffic.
Do you see it logging missing LCP echos in the ppp log when this happens? It has to drop 5 before the ppp link is restarted and that doesn't appear to be happening but if the NICs stop passing traffic it would drop some LCP packets.
Do you have access to the modems on a private IP at all? You could assign the parent interfaces with those IPs as gateway and they would then be monitored separately to the PPP link.