Strange network drop for 1 minute every hour
-
Hi everyone,
I'm having a strange behavior a couple of months now. My network out drops every hour for about 45 to 70 seconds and than comes up again. I'm running 2.7.0-RELEASE (amd64) FreeBSD 14.0-CURRENT. I do have a single WAN and LAN network card. The network cards seem to be OK (both Intel). The Pfsense box is on a single bare metal machine noting virtualized. The incoming line is fiber-optic 100/100Mbps and is directly transformed from my Cisco switch onto two vlan ports. One is TV and is forwarded directly to the TV Boxes without any interference from Pfsense. The other vlan goes directly into Pfsense where the entire network will be handled.
I changed the switch already but that wouldn't solve the issue. Also the logs of the switches don't indicate anything that something is wrong.
Initially I charged T-Mobile for this issue. They send someone to check the lines but everything was ok.
- I can ping my internal network devices when it happens
- I can not ping 8.8.8.8 or any outside address
- I can not ping my pfsense
I concluded that pfsense is my bottleneck as i also pinged every device in between my computer (wired) and pfsense. All pingable until i reach my pfsense.
During the outage every hour everything is green no outages indicated anywhere. Network traffic is cumming in but not going out. The logs do not indicate any hint to where to look or what to look for.
As this is now going on for months now I have bought a INTEL NUC with to lan ports. Over the next couple of day's I will test this machine because I'm desperate to get this working again.
If someone has a similar experiences I would love to hear about it. If you have any hint where to look and for what, I'm also very happy to get any ideas. My Pfsense runs for 7 years now and I never had this kind of issue.
On more strange thing. When I shutdown my pfsense and uncouple it (power-cord disconnected) for say 15 minutes and I than start pfsense again, the time issues will not start 15 minutes later but will still takes place at 10 minutes after the full hour. This is something I can't wrap my head around!
Any ideas or suggestions are welcome.
Many thanks,
Chris
-
@cforker said in Strange network drop for 1 minute every hour:
will still takes place at 10 minutes after the full hour
My suggestion is to load the Cron package and look what process starts every hour and 10min - that could be a start.
Regards,
fireodo -
Another suggestion.
Open a first console / SSH, option 8.
tail -f /var/log/system.log
Open, a second SSH : option 8 :
tail -f /var/log/resolver.log
and now wait and watch.
The answer will scroll before you.edit :
Wait ....
@cforker said in Strange network drop for 1 minute every hour:
I can not ping my pfsense
The LAN network goes down ?!
You are using a cabled connection, right ? (Wifi is nice if it works, but when you notice the slightest problem, stop using it).
If you suspect@cforker said in Strange network drop for 1 minute every hour:
Initially I charged T-Mobile for this issue. They send someone to check the lines but everything was ok
You might need them if you suspect your WAN is bad. That doesn't stop pfSEnse from working, neither impacting your LAN network.
@cforker said in Strange network drop for 1 minute every hour:
Also the logs of the switches
Logs of a switch ? Didn't know switches had logs as these are dumb device, like a :
or do you use a smart switch ?
edit : Humm, probably, you use VLANs.
Then add to the test : replace this switch for a dumb during the test. Undo also VLANs during the tests.@cforker said in Strange network drop for 1 minute every hour:
I can not ping 8.8.8.8 or any outside address
I can not ping my pfsenseFrom your PC on the your pfSense LAN, if you can not ping (using the IP "192.168.1.1" right ? Not the pfsense host name !) pfSense, then it is normal that you can't ping 8.8.8.8 neither.
-
Switches certainly can have logs, and if they're Cisco probably do. Do the switch logs show the link dropping? Do the pfSense logs show that?
-
I agree managed switches most likely have a log, and can log up down on interfaces, etc. But these like $40 smart switches that can do vlans most likely not - at least not that are of much worth. Like up down logging of interfaces..
Looking at my tplink - I can not find any logs anywhere in it.. But my cisco
12-Jul-2023 23:16:59 :%LINK-I-Up: gi10 12-Jul-2023 23:16:58 :%LINK-W-Down: gi10
As example of up down logging.. plus lots of other good info, etc.
-
Hi fireodo,
I did that and pretty soon I could exclude that because the schedule is moving up every hour with 1 or 2 seconds. So if it is 12:10 today when this is happening, in a view days it could already be 12:12. The schedule is moving so this is not CRON.
Many thanks anyway for your quick reply :-)
-
Hi Gertjan,
Nope, not using Wifi. I have my Cisco Switches linked on fibre optic. Pinging them will be fine so no issue here. It is either fibre optic or Cat 6 cabling. Testing the cables also shows that everything is fine. No issue there either. When pinging I have several terminals open logging the pings into a textile. One is pinging 192.168.1.1. the other 8.8.4.4 and the rest all the swichtes (including the one where my PfSense is on. Pfsense and 8.8.4.4 are not pingable at the moment of issue. Also forgot to mention the schedule of the "distortion" is moving up with one or two seconds every hour. Meaning that when an error happens at 12:10 today, in two days it will be 12:11 or even 12:12.
T-Mobile is saying that everything is in order, and yes, that would not explain why my LAN is going down every hour for a couple of seconds (45-90 seconds). They wanted to do more investigations but stoped as someone mentioned that I have more than 100 devices in my network. They multiply every device with 10mbit and suggested upgrading. But 100/100 is the max I can get here and my out or input is really higher then 60 mbit with could not explain this behaviour. Anyway explaining some ***** no tech guy how things are working was a wast of time and I stoped communicating with them as they wouldn't help me any further.
Yes, a better switch has logs. You can find a lot in there especially if T-Mobile is doing settings on their fibre-optic end. But no, nothing in here which could give me any indication of what is going on. From this log I found out that traffic is incoming on the fibre-optic side into the lan side via Pfsense. So in is working at the down-time, but out is not working. I only yous V-Lans for the incoming side to separate 300 (internet) and 640 (TV) from the fibre-optic. Internally I do not use any V-lans. I have replaced the switches but then again they should also indicate sorting in their logs or go down as well, which they are not. Cisco is quite reliable with this.
Not sure what you mean "can not ping the hostname", but when everything is up I can ping the hostname and the ip adres of Pfsense. I also can ping any external adres (if not dropped). During the outage I'm not able to ping IP nor Hostname, and of course any outside adres as well.
Cheers,
Chris
-
I still want to see the pfSense logs I mentioned above.
Interfaces going down is a hefty event. That will get logged, even if it was the device on the other side to the NIC that brought down the connection.
And if the source of the issue was on the pfSense side (example : like snort or suricata restarting on a LAN interface) then the system log would show that right away.So, as always : where are the logs ? ?
-
~1h with a few seconds incrementing sounds a lot like it could be DHCP lease renewal. That shouldn't drop the link unless the server is somehow passing an invalid lease time or similar.
Check the dhcp and system logs.Steve
-
@stephenw10 I was going to say the same thing.. Does sound like maybe a dhcp problem lease expires, has to do a discover this would had a few seconds to the process, etc.
-
Hi Gertjan,
as you mentioned, doing a tail on the system.log, has revealed itself!
There was a little entry saying:
09:18:28 proxy kernel: arp: ec:0b:ae:9f:1b:8f is using my IP address 192.168.1.1 on em0!
After a little track and trace of this arp address, I found out that the evil of all that is my Airco! In there is a Wifi Module with that exact address. I disabled it from accessing the wifi access points and voila, no disruption in my network anymore!
Question now remains why a remote wifi module is trying to take over my Pfsense address. But that is not for this forum anymore.
Thank you for putting me into the right direction. I'm very thankful for that as it drove me crazy. Sometimes you just need someones perspective to find the right solution.
Again many thanks all for helping me out here.
Cheers,
Chris
-
@cforker said in Strange network drop for 1 minute every hour:
Question now remains why a remote wifi module is trying to take over my Pfsense address
Easy to solve.
It's pretty obvious the airco isn't using DHCP, as the DHCP server in the network (pfSense) would not give its own 192.168.1.1.So, set up a PC using 192.168.1.2 mask 255.255.255.0 DNS : don't care and gateway don't care.
remove the link bewten the AP and pfSense.
Now, have the airco connecto using wifi to the AP.
Connect the PC to the AP.
Now, on your PC, you can access the airco's GUI.
Active the airco's DHCP mode.Restore DHCP on your PC.
Re connect the AP to pfSense.
Check pfsense DHCP server LAN that the airco - you know its ec:0b:ae:9f:1b:8f - gets an IP from the DHCP server.Your good for the next issue.
Btw : if the airco can't use other IP settings, and it can only use "192.168.1.1" : throw it out of the windows. Please mention type and brand : this one can't be called 'connected' as it is a network destroyer.
09:18:28 proxy kernel: arp: ec:0b:ae:9f:1b:8f is using my IP address 192.168.1.1 on em0!
You agree that that line says a lot .... and is easy to read, and understand.
Logs are very useful. -
Hi Gertjan
And this is the problem, because it has 192.168.1.62 assigned. So it is getting a IP from DHCP but somehow it tries to get 192.10.1.1 every 60 minutes and 2 seconds. In my lease table it is assigned and active (not for the moment obviously). Het gaat hier om de Mitsubishi WF-RAC Wifi module . Ik kan hem ook niet via een interface benaderen alleen via een app.
By the way, I assigned a static IP from Pfsense and now it seems to work just fine.
Gr,
Chris
-
@cforker said in Strange network drop for 1 minute every hour:
Mitsubishi WF-RAC Wifi module
Oh ... great.
I love the Mitsubishi stuff, using two City Line "big towers".
We have this - 18 years old ( yep, the system just won't die ...) and it uses a 10 Mbit /sec half duplex.
But it uses a DHCP client that behaves well.
No IPv6 of course.Btw : critical stuff like coffee machine, airco, credit card machines etc should never use wifi - but I say that because I'm old(er) and think I saw a lot ^^
@cforker said in Strange network drop for 1 minute every hour:
192.10.1.1 every 60 minutes and 2 seconds
192.10.1.1 who /what is that ? It's a valid IP somewhere in the states.
Is it the airco that wants to call 'home' ?
It can't be DHCP .... -
@Gertjan said in Strange network drop for 1 minute every hour:
192.10.1.1 who /what is that ?
NetRange: 192.10.0.0 - 192.10.255.255 CIDR: 192.10.0.0/16 NetName: AMAZO-4 Organization: Amazon.com, Inc. (AMAZO-4)
-
I'm assuming that was a typo.