SG3100 - Frequent Internet Drops
-
@jbgdev said in SG3100 - Frequent Internet Drops:
I have just tried disabling Gateway monitoring,
Hi,
If I were you, I'd check the DHCP/GW logs, the "dpinger" won't have much to do with it....
Of course, if you connect to the ISP from ONT like this (DHCP)
-
Sorry for the delay!
You were correct - disabling Gateway Monitoring did not solve the problem, the internet dropped again yesterday afternoon.
I will get into the logs and see if they shed some more light on the problem.
Can you explain a little more what you mean with the last sentence:
@daddygo said in SG3100 - Frequent Internet Drops:
Of course, if you connect to the ISP from ONT like this (DHCP)
I am new to networking, in the past I've just used the hardware provided by the ISP. My setup currently has a small box that the fiber line runs into, that feeds another box about the same size. Both of these came from the ISP, I'm not sure which one is technically the ONT.
The second box has an ethernet plug which is running into the Netgate router WAN port.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
Of course, if you connect to the ISP from ONT like this (DHCP)
No problem,....... we have time
I was going to say that if the situation is exactly like yours,....-that for example, that your ISP provides GPON (fiber) - then there the CPE is most likely an ONT device....
(I just made that up by accident ONT you have)Most of these GPON systems which is use ONT at endpoint, in most cases DHCP to assign the IP address to the user.
-
So you can keep good track in the DHCP log(s) of what happens when you get disconnected.
-
In the event that this takes several days to happen, - "ISP connection down", you may need to increase the default LOG size to search log rows going back several days.
3.In addition, the DHCP settings can be fine-tuned, you might want to ask your ISP support, do they use any special parameterize.....(?)
(or you can play with this on your own, once you have analysed the DHCP logs)the pictures in order by step by step, what I was talking about:
- f.e.: on "igb4"
- Log size (possible increase, from 500K)
- DHCP fine-tune on WAN intf. f.e.: Protocol Timing:
-
-
@daddygo Awesome, thank you for this information! I have increased the log size and will check it next time I experience the problem. I will be back when I have new information to work with!
-
@jbgdev said in SG3100 - Frequent Internet Drops:
I will be back when I have new information to work with!
Okay, we'll be here
-
@daddygo Internet dropped at some point between midnight and 8AM. Beginning at 12:33AM, I see this message repeat every few minutes (sorry if I'm redacting information that doesn't really need to be redacted):
Jun 9 00:33:25 pfSense dhclient[97265]: DHCPREQUEST on mvneta2 to 10.xxx.xxx.xx port 67
After a few hours, his message finally starts appearing, and doesn't stop until I unplugged the WAN around 8:30AM:
Jun 9 04:17:38 pfSense dhclient[91547]: send_packet: No route to host
During this time, several devices are making successful DHCPREQUESTs on mvneta1.
-
mvneta2 is WAN by default as I recall. That sounds like it's trying to renew its WAN IP and can't. One possibility is a problem with the upstream device. Instead of unplugging the cable from WAN on the 3100, you might try powering off that other device.
-
@steveits - The only device upstream is the ONT. I can try turning that off and back on tonight.
If a reset like that doesn't permanently fix the problem, I definitely prefer to just unplug the WAN and plug it back in. Obviously I prefer to not have to do anything!
Are there any settings that I should check that might explain why the WAN is unable to renew it's IP? Is this something I need to contact my ISP about or is there still reason to think the problem lies with the Router and/or PFSense?
-
Internet works up until the "no route to host" message? That seems kind of like it's communicating, but can't renew the DHCP which I would think would be on the ONT's end... See if the System/Gateways logs show anything about the connection dropping.
You could try replacing the patch cable but that seems unlikely if it is connecting out otherwise.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
DHCPREQUEST on mvneta2 to 10.xxx.xxx.xx port 67
it is possible that a set address renewal will be started + GW restart (from the ISP side!)
now, it seems that the ISP's DHCP server (behind the ONT), which is at RFC1918 -10.x.x.x, is not responding to your requests.....
-
please first check the condition of the cable between ONT and pfS. WAN (mvneta2), it should be a good quality, minimum Cat5e cable (if all is well with this cable -not dodgy cable :))
-
then try to set the speed negotiation on the WAN interface to fixed 1000baseT <full-duplex, - here comes the test again
-
if the above does not help, you can send a ticket to the service provider, as I do not think this is a pfSense or SG problem
BTW:
after midnight - DHCP + restart, hmmmm/ day
frequent user IP address changes are not common in optical networks, but this depends on the service provider's routine...
which is typical, you always get back the address you had before
(usually the provider will "bind" this (DHCP - endpoint public IP) to the MAC address of your WAN port and leave it unchanged for a long time)show you what I have behind the ONT (for me):
stuff received on the WAN interface is updated as 1200s
-
-
@steveits I believe it works until the "no route to host" message, but I hope to verify for sure soon. Since it happened at 4AM, I was snoozing when the internet actually dropped. Sometimes it drops during the day so I'd have better luck reviewing the logs in that situation and pinpointing the minute it officially goes down.
Checking the gateway logs, this message appeared at 4:03, so about 14 minutes before the no route to host messages started:
Jun 9 04:03:23 dpinger 43389 WAN_DHCP 8.8.8.8: Alarm latency 32151us stddev 467us loss 22%
The System Logs only have entries during the couple of seconds between unplugging the WAN and plugging it back in.
I saw it suggested elsewhere to try putting a dumb switch between the ONT & SG-3100. I can give that a shot too, though I'd really like to fix it in this exact setup rather than adding more hardware.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
to try putting a dumb switch between the ONT & SG-3100.
yes this is used as a debugging tool, it points to exactly what I suggested for speed negotiation
+++edit:
the firewall device eth. PHY (chip, IC) is not fully compatible with the ISP ONT eth. port PHY and can only negotiate speed after reboot again or cable connect / disconnect(this is mostly the case with Realtek's cheaper PHY - in ONT)
and what may cause this problem is a nightly restart of ONT using TR069
-
@jbgdev 22% would be the packet loss. It could be temporary (around here Comcast drops out for a few minutes once a week or so). What are the other log entries around that one?
-
@steveits said in SG3100 - Frequent Internet Drops:
22% would be the packet loss.
22% not the end of the world (for a shorter period), this should recover without problem, without dropping the connection
think of the older ADSL solutions, we saw more than 22, especially if the DSLAM was more than 2 Km from the endpoint
@jbgdev
you can try to insert a cheap switch between ONT and SGhttps://www.tp-link.com/pt/business-networking/easy-smart-switch/tl-sg105e/
https://www.ui.com/edgemax/edgerouter-x/
(you can configure it as a switch from the menu and it also works as a router for $30 you get a nice little debugging tool - for the future)usually in GPON networks, there is no limit on the addresses available on the endpoint (this is also provider dependent), so you can get IP with DHCP for two devices
(check this, because Comcast may limit it)you can then test with an older router or laptop to see whether or not they also drop the connection
be careful, because you're not behind NAT (in this case) so you're out in the shop window
this is how it works here PT (but ONT IPoE on VLAN):
-
@jbgdev said in SG3100 - Frequent Internet Drops:
pfSense dhclient[97265]: DHCPREQUEST on mvneta2 to 10.xxx.xxx.xx
Still, strange.
A RFC1918 (10.0.0.0/8) as a WAN IPv4, for a fiber connection using an ONT.
Normally, you really have a RFC1918 WAN IP ?The start of the issue is : pfSense 'knows' the WAN IP DHCP lease times out (or, to be more precise : it's half way), and starts to request a new one.
By preference, it will ask for the (WAN) IP it already uses.
Or, the ISP DHCP server - a device in the rooms of your your ISP, isn't answering.
Or : it didn't receive the DHCP request ? so it's just normal it doesn't answer.
Such a situation is totally understandable as the connection was 'broken' or the ONT became brain dead or the cable between pfSense is bad, or the NIC of the ONT is bad or the pfSense WAN NIC is member if the Realtek family.
The 'switch' idea should be tested.If you put a "PC" directly into the ONT, does it 'connect' ? do you see the same issue ?
If not : you'll know the cable and ONT are ok - and the issue is on pfSense's side (WAN NIC ....)
If it does : put some pression on your ISP, by, fro example ; quitting them.Btw : be careful by using some random IP like 8.8.8.8 as a "watchdog" for your connection.
The minute the "protector of 8.8.8.8" kicks in, you stop receiving returns of your ICMP requests and pfSense deducts the connections is bad ... which is totally wrong of course. 8.8.8.8 has other things to do as answering to ICMP request. Its a DNS server, not an "up-time tester" and ICMP replier.
The day that Google decides, just for an hour or so, NOT to reply from 8.8.8.8 on ICMP requests, billions will loose their connection. Exactly the same billions that "just don't understand it". They will learn that day. The hard way. It will be Google's fault, of course .... -
@gertjan said in SG3100 - Frequent Internet Drops:
Still, strange.
A RFC1918 (10.0.0.0/8) as a WAN IPv4, for a fiber connection using an ONT.
Normally, you really have a RFC1918 WAN IP ?Just a quick test with our ONT, I think not the WAN IP is in the RFC1918 range, but the ISP DHCP server behind the ONT.
f.e.: this is how it works in Portugal the GPON (Altice Network - Nokia), this is very good implementation for me, because I get so many public IPs with the following solution, as many as I want...
(well, it's not redundant, but I can use multiple public IPs)IPoE VLAN12 on the ONT eth. port (this is default ISP VLAN, as it has VOIP and IPTV), I insert a Cisco SG350-10 with VLAN12 trunk directly into the eth. port of the ONT and with PVID I get different public IP on all switch ports (with PVID conf.)
In the DHCP log of the "test" WAN2 interface, you can see that the ISP's DHCP server in RFC1918 10.x.x.x
looks like this for me... (DHCPOFFER from / and PACK 10.x.x.x (pfSense returns here)
+++edit:
Oups left in a public IP, no problem it was just a test anyway
+++edit2:
I still think that the speed negotiation is causing the connection to fail and pfSense cannot return to the DHCP server behind ONT(it could be temperature-indicated on PHY or idiot too frequent TR069 restart)
-
@daddygo : Oops, you're right, asking a new lease from a DHCP server living on 10.a.b.c doesn't imply that the proposed WAN IP is also a 10.a.b.d.
-
I really appreciate all the help and suggestions from this thread.
I swapped the cable between the ONT & SG3100 on June 9 at 8:45PM and lost internet this morning at 9:50. Old cable was a 10+ year old CAT5 and new one is a CAT5E. Was hoping the simplest fix would work but oh well.
There is no "No Route to host" message in the DHCP logs this time, so maybe that was an unrelated issue previously. At 9:49AM, this appears in the Gateway logs:
Jun 11 09:49:23 dpinger 84081 WAN_DHCP 8.8.8.8: Alarm latency 31856us stddev 459us loss 21%
At 9:50AM I received a notice on my phone that my network does not appear connected to the internet. This is the lone entry in the System log from about 36 hours before to an hour after when I unplugged the WAN and plugged back in.
I do occasionally see in the logs:
Jun 11 11:44:56 dhclient 81802 DHCPREQUEST on mvneta2 to 255.255.255.255 port 67
and is followed by a DHCPACK response to my Gateway address (100.x.x.x) and then
Jun 11 11:44:56 dhclient 86546 RENEW Jun 11 11:44:56 dhclient 87064 Creating resolv.conf
So, all of this being said, I will continue trying the other recommendations you all have given me, unless this updated info suggests a different solution to you all. This weekend I will put a dumb switch between the ONT & WAN to see if that solves the issue. Unfortunately it seems the only way to test is to change something and wait 30ish hours.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
and wait 30ish hours.
Hmmmm....
there are ideas still on the list
We look forward to your return, good luck
-
@daddygo said in SG3100 - Frequent Internet Drops:
@jbgdev said in SG3100 - Frequent Internet Drops:
to try putting a dumb switch between the ONT & SG-3100.
yes this is used as a debugging tool, it points to exactly what I suggested for speed negotiation
+++edit:
the firewall device eth. PHY (chip, IC) is not fully compatible with the ISP ONT eth. port PHY and can only negotiate speed after reboot again or cable connect / disconnect(this is mostly the case with Realtek's cheaper PHY - in ONT)
and what may cause this problem is a nightly restart of ONT using TR069
I put a dumb switch between the ONT and router, internet dropped this morning around 7:30AM (roughly 37 hours after the change). Same errors in the log as before.
Based on this test and comments in this thread, does it seem like this is pointing back to the ISP and/or the ISP equipment?