WAN drops after two hours SuperMicro X8SIE
Hello everyone! i'm having a strange issue with this server and the WAN interface. After exactly 2 hours of usages the wan drops out as if the carrier (verizon fios) looses the connection. It doesnt matter what i do with the interface it will always happen after exactly two hours. The errors show that it has no connectivity. For testing i have reinstalled pfsense, installed untangled, and have tried varies builds of Sophos XG home. They all have the exact same issue of loosing the WAN connection after 2 hours. No other noticeable issues occur. It so happens that i have two of these servers. i even swapped out servers just for testing and the same exact thing happens. So to take things further i then took an old box i had laying around and threw a dual port intel card in and it works just fine with no lose of connection. The server specs are as follows:
8 GB ram
intel Xeon X3430 @2.40 GHz
Dual Intel 82574L NIC
BIOS Verison 1.2
While the WAN is online the server works wonders. i get 825 down and 968 up with no issues. After the two hours i loose all connectivity to wan and after a reboot it works fine again. I have followed the troubleshooting link: https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html and have also tried injecting different drivers with no luck.
I use the second server currently as a game server running server 2012 R2 Standard with no issues at all. I have no packet loss or interruptions after two hours. Any advice as i have searched through all kind of documentation not only through pfsense but the other vendors as well. I thought i'd give it a shot posting before buying a new server :)
Thanks everyone for your time!
What does it show when the WAN fails? No link? No IP?
Can you restore it by re-connecting the WAN cable? Or by down/up -ing the interface?
Thank you for the reply! link is up and it still maintains an IP. it does restore if i reset the ONT or down/up the interface or even reboot the system. i'm going to let it error out again to gain the exact logs as i rebooted the system and it wiped out my logs i had. I will post that once the error occurs. Again thanks for your time!
Okay as always dropped after two hours :) here are all of the results pulled while the event happens:
Jan 16 23:02:33 rc.gateway_alarm 4064 >>> Gateway alarm: WAN_DHCP (Addr:xx.xx.xxx.x Alarm:1 RTT:2.423ms RTTsd:2.051ms Loss:21%) Jan 16 23:02:33 check_reload_status updating dyndns WAN_DHCP Jan 16 23:02:33 check_reload_status Restarting ipsec tunnels Jan 16 23:02:33 check_reload_status Restarting OpenVPN tunnels/interfaces Jan 16 23:02:33 check_reload_status Reloading filter Jan 16 23:02:34 php-fpm 336 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Jan 16 23:02:34 php-fpm 336 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Jan 16 21:03:01 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr xx.xx.xxx.x bind_addr xx.xx.xxx.xx identifier "WAN_DHCP " Jan 16 23:02:33 dpinger WAN_DHCP xx.xx.xxx.x: Alarm latency 2423us stddev 2051us loss 21%
Name Gateway Monitor RTT RTTsd Loss Status Description WAN_DHCP (default) xx.xx.xxx.x xx.xx.xxx.x 0ms 0ms 100% Offline Interface WAN_DHCP Gateway
Minimum Average Maximum Last 95th Percentile user util. 0.00 % 0.38 % 0.94 % 0.31 % nice util. 0.00 % 0.00 % 0.04 % 0.00 % system util. 0.04 % 0.63 % 1.61 % 0.35 % interrupt 0.00 % 0.18 % 2.81 % 0.00 % Percent processes 152.39 158.70 163.46 163.46
And WAN status:
Status up DHCP up Relinquish Lease MAC Address 00:25:xx:xx:xx:xx IPv4 Address xx.xx.xxx.xx Subnet mask IPv4 255.255.255.0 Gateway IPv4 xx.xx.xxx.x IPv6 Link Local xxxx:: DNS servers 127.0.0.1 22.214.171.124 126.96.36.199 MTU 1500 Media 1000baseT <full-duplex> In/out packets 4206482/1671270 (5.58 GiB/1.22 GiB) In/out packets (pass) 4206482/1671270 (5.58 GiB/1.22 GiB) In/out packets (block) 4609/0 (221 KiB/0 B) In/out errors 0/0 Collisions 0
I disabled/Re enabled the interface and came back up in seconds..
Link is up and interface still has an IP.
Hmm, interesting. I don't see anything indicating an ARP issue logged there.
I would take a packet capture on the WAN when it is down and see what is being sent and if anything at all is coming back.
One thing that may happen is if you set promiscuous mode in the pcap it comes back up. We have seen that happen before but I think only on ppp connections.
I assume you're running 2.4.4p2?
Since the issue is carried between OSes it seems likely a hardware issue. On which case I would check for power saving options that maybe enabled. There is a PCIe power saving setting that some BIOSes enable that can behave like this.