Netgate XG-7100 Drops connection every 10-15 mins



  • Hi all,

    I have a Netgate XG-7100, with Pfsense 2.4.4 installed. The device has been working well for a number of months but recently it started dropping the internet connection for 1-3 seconds. I cant seem to find any relevant errors in the logs however I notice the outages as follows:

    1. Every network client looses internet access at the same time
    2. When pinging the device during the outage I see 100% package loss (only for this device and nothing for the other devices)
    3. I have checked/changed all other devices and they are not casuing the issue.

    Any ideas?

    Thanks!



  • Check if there's a loop in your network.


  • LAYER 8 Global Moderator

    On item 2, what do you mean by (only for this device and nothing for the other devices)?

    If you mean other devices can ping either other on the same network, pfsense has nothing to do with that. Do you mean you can ping from 1 device on the 7100 switch to another device on the same 7100 switch?

    If there is nothing in the log for pfsense log, but you can not get to pfsense lan IP via ping.. Would check the uplink from your switching environment to pfsense.

    How do you have the switch configured on the 7100, how do you have your network connected to the 7100?



  • Hi, thanks for getting back to me.

    With 2 I mean from 1 device on the network, I can ping multiple other devices on the same network without package loss. They are all joined by one switch (not Pfsense). The Netgate Lan port is connected to that switch. I dont have a switch configured on the Netgate box, just a WAN (going to my IP router) and 1 LAN going to an unmangeged switch.

    THanks


  • LAYER 8 Global Moderator

    So your using 1 of the sftp interfaces as your wan? All of the non sfp interfaces are part a switch.

    Or your using eth1 as wan, and then eth2-8 are lan - which I believe is the default config.

    If you have a downstream switch and devices can all ping each other than has nothing to do with pfsense - so sure yeah you could unplug pfsense, turn it off and all your devices on your downstream switch could still talk to each other.

    If your not seeing any log entries - say your wan quality going wonking? - I would validate your uplink from your downstream switch to pfsense is good. Check the cable - you could also try changing ports both on your downstream switch and the switch port connected to pfsense. Since its un managed its impossible to check the switch for issues, like a interface reset (other than watching for lights) or errors on the connection, or a mac address flip or stuff like that. Duplex issues, there are lots of things that could happen but without a managed switch hard to look into those.. You would hope though that such things would be logged/viewable on the 7100... But not sure on the details for that model - don't have one to play with :(

    Since you say your using a dumb switch downs stream, I take your not doing any sort of lagg for your upload - and only have 1 port connected to the 7100 switch from your downstream switch.

    Are you sure its 10 or 15 minutes, this is repeating pattern, like every X minutes? If was say 20... this is the default arp cache of pfsense.. And maybe you have something odd going on with that.. But if is not repeating exactly X mins seems unlikely.

    Also we are talking IPv4 right - not ipv6.. Just for clarification.

    One test you might be able to do is get a sniff going on pfsense for pings from some device on your lan... Make sure you set it to store enough - it defaults to only 100 packets... Then get a running ping going from one of your lan clients to to the pfsense lan IP... Then after the issue, go back to pfsense and stop the capture and take a look see.. Was pfsense still getting the pings, and just didn't answer during the issue, or did you not see any ping requests during the issue.

    There are some ping tools you can use that allow for faster pings than the default normal 1 a second... So you could get say a clearer picture of the issue duration.. If it really only 1-3 seconds long when it happens.

    Oh btw how exactly are testing loss of internet, could it be say more just a delay in resolution when say unbound is restarting on a dhcp registration.. This can present symptoms like your seeing where internet doesn't work for second, then you refresh your browser and working.


Log in to reply