WAN drops after two hours SuperMicro X8SIE



  • Hello everyone! i'm having a strange issue with this server and the WAN interface. After exactly 2 hours of usages the wan drops out as if the carrier (verizon fios) looses the connection. It doesnt matter what i do with the interface it will always happen after exactly two hours. The errors show that it has no connectivity. For testing i have reinstalled pfsense, installed untangled, and have tried varies builds of Sophos XG home. They all have the exact same issue of loosing the WAN connection after 2 hours. No other noticeable issues occur. It so happens that i have two of these servers. i even swapped out servers just for testing and the same exact thing happens. So to take things further i then took an old box i had laying around and threw a dual port intel card in and it works just fine with no lose of connection. The server specs are as follows:

    Supermicro X8SIE
    8 GB ram
    intel Xeon X3430 @2.40 GHz
    Dual Intel 82574L NIC
    BIOS Verison 1.2

    While the WAN is online the server works wonders. i get 825 down and 968 up with no issues. After the two hours i loose all connectivity to wan and after a reboot it works fine again. I have followed the troubleshooting link: https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html and have also tried injecting different drivers with no luck.

    I use the second server currently as a game server running server 2012 R2 Standard with no issues at all. I have no packet loss or interruptions after two hours. Any advice as i have searched through all kind of documentation not only through pfsense but the other vendors as well. I thought i'd give it a shot posting before buying a new server :)

    Thanks everyone for your time!


  • Netgate Administrator

    What does it show when the WAN fails? No link? No IP?

    Can you restore it by re-connecting the WAN cable? Or by down/up -ing the interface?

    Steve



  • Thank you for the reply! link is up and it still maintains an IP. it does restore if i reset the ONT or down/up the interface or even reboot the system. i'm going to let it error out again to gain the exact logs as i rebooted the system and it wiped out my logs i had. I will post that once the error occurs. Again thanks for your time!



  • Okay as always dropped after two hours :) here are all of the results pulled while the event happens:

    Jan 16 23:02:33	rc.gateway_alarm	4064	>>> Gateway alarm: WAN_DHCP (Addr:xx.xx.xxx.x Alarm:1 RTT:2.423ms RTTsd:2.051ms Loss:21%)
    Jan 16 23:02:33	check_reload_status		updating dyndns WAN_DHCP
    Jan 16 23:02:33	check_reload_status		Restarting ipsec tunnels
    Jan 16 23:02:33	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Jan 16 23:02:33	check_reload_status		Reloading filter
    Jan 16 23:02:34	php-fpm	336	/rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Jan 16 23:02:34	php-fpm	336	/rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
    
    
    Jan 16 21:03:01	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr xx.xx.xxx.x bind_addr xx.xx.xxx.xx identifier "WAN_DHCP "
    Jan 16 23:02:33	dpinger		WAN_DHCP xx.xx.xxx.x: Alarm latency 2423us stddev 2051us loss 21%
    
    
    Name	Gateway	Monitor	RTT	RTTsd	Loss	Status	Description
    WAN_DHCP (default)	xx.xx.xxx.x	xx.xx.xxx.x	0ms	0ms	100%	Offline	Interface WAN_DHCP Gateway
    
    
    
    Minimum	Average	Maximum	Last	95th Percentile
    user util.	0.00 %	0.38 %	0.94 %	0.31 %	
    nice util.	0.00 %	0.00 %	0.04 %	0.00 %	
    system util.	0.04 %	0.63 %	1.61 %	0.35 %	
    interrupt	0.00 %	0.18 %	2.81 %	0.00 %
    Percent	
    processes	152.39	158.70	163.46	163.46	
    
    

    And WAN status:

    Status
    up
    DHCP
    up     Relinquish Lease
    MAC Address
    00:25:xx:xx:xx:xx
    IPv4 Address
    xx.xx.xxx.xx
    Subnet mask IPv4
    255.255.255.0
    Gateway IPv4
    xx.xx.xxx.x
    IPv6 Link Local
    xxxx::
    DNS servers
    127.0.0.1
    1.1.1.1
    8.8.8.8
    MTU
    1500
    Media
    1000baseT <full-duplex>
    In/out packets
    4206482/1671270 (5.58 GiB/1.22 GiB)
    In/out packets (pass)
    4206482/1671270 (5.58 GiB/1.22 GiB)
    In/out packets (block)
    4609/0 (221 KiB/0 B)
    In/out errors
    0/0
    Collisions
    0
    

    I disabled/Re enabled the interface and came back up in seconds..
    Link is up and interface still has an IP.


  • Netgate Administrator

    Hmm, interesting. I don't see anything indicating an ARP issue logged there.

    I would take a packet capture on the WAN when it is down and see what is being sent and if anything at all is coming back.

    One thing that may happen is if you set promiscuous mode in the pcap it comes back up. We have seen that happen before but I think only on ppp connections.

    I assume you're running 2.4.4p2?

    Since the issue is carried between OSes it seems likely a hardware issue. On which case I would check for power saving options that maybe enabled. There is a PCIe power saving setting that some BIOSes enable that can behave like this.

    Steve