SG3100 - Frequent Internet Drops
-
@steveits I believe it works until the "no route to host" message, but I hope to verify for sure soon. Since it happened at 4AM, I was snoozing when the internet actually dropped. Sometimes it drops during the day so I'd have better luck reviewing the logs in that situation and pinpointing the minute it officially goes down.
Checking the gateway logs, this message appeared at 4:03, so about 14 minutes before the no route to host messages started:
Jun 9 04:03:23 dpinger 43389 WAN_DHCP 8.8.8.8: Alarm latency 32151us stddev 467us loss 22%
The System Logs only have entries during the couple of seconds between unplugging the WAN and plugging it back in.
I saw it suggested elsewhere to try putting a dumb switch between the ONT & SG-3100. I can give that a shot too, though I'd really like to fix it in this exact setup rather than adding more hardware.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
to try putting a dumb switch between the ONT & SG-3100.
yes this is used as a debugging tool, it points to exactly what I suggested for speed negotiation
+++edit:
the firewall device eth. PHY (chip, IC) is not fully compatible with the ISP ONT eth. port PHY and can only negotiate speed after reboot again or cable connect / disconnect(this is mostly the case with Realtek's cheaper PHY - in ONT)
and what may cause this problem is a nightly restart of ONT using TR069
-
@jbgdev 22% would be the packet loss. It could be temporary (around here Comcast drops out for a few minutes once a week or so). What are the other log entries around that one?
-
@steveits said in SG3100 - Frequent Internet Drops:
22% would be the packet loss.
22% not the end of the world (for a shorter period), this should recover without problem, without dropping the connection
think of the older ADSL solutions, we saw more than 22, especially if the DSLAM was more than 2 Km from the endpoint
@jbgdev
you can try to insert a cheap switch between ONT and SGhttps://www.tp-link.com/pt/business-networking/easy-smart-switch/tl-sg105e/
https://www.ui.com/edgemax/edgerouter-x/
(you can configure it as a switch from the menu and it also works as a router for $30 you get a nice little debugging tool - for the future)usually in GPON networks, there is no limit on the addresses available on the endpoint (this is also provider dependent), so you can get IP with DHCP for two devices
(check this, because Comcast may limit it)you can then test with an older router or laptop to see whether or not they also drop the connection
be careful, because you're not behind NAT (in this case) so you're out in the shop window
this is how it works here PT (but ONT IPoE on VLAN):
-
@jbgdev said in SG3100 - Frequent Internet Drops:
pfSense dhclient[97265]: DHCPREQUEST on mvneta2 to 10.xxx.xxx.xx
Still, strange.
A RFC1918 (10.0.0.0/8) as a WAN IPv4, for a fiber connection using an ONT.
Normally, you really have a RFC1918 WAN IP ?The start of the issue is : pfSense 'knows' the WAN IP DHCP lease times out (or, to be more precise : it's half way), and starts to request a new one.
By preference, it will ask for the (WAN) IP it already uses.
Or, the ISP DHCP server - a device in the rooms of your your ISP, isn't answering.
Or : it didn't receive the DHCP request ? so it's just normal it doesn't answer.
Such a situation is totally understandable as the connection was 'broken' or the ONT became brain dead or the cable between pfSense is bad, or the NIC of the ONT is bad or the pfSense WAN NIC is member if the Realtek family.
The 'switch' idea should be tested.If you put a "PC" directly into the ONT, does it 'connect' ? do you see the same issue ?
If not : you'll know the cable and ONT are ok - and the issue is on pfSense's side (WAN NIC ....)
If it does : put some pression on your ISP, by, fro example ; quitting them.Btw : be careful by using some random IP like 8.8.8.8 as a "watchdog" for your connection.
The minute the "protector of 8.8.8.8" kicks in, you stop receiving returns of your ICMP requests and pfSense deducts the connections is bad ... which is totally wrong of course. 8.8.8.8 has other things to do as answering to ICMP request. Its a DNS server, not an "up-time tester" and ICMP replier.
The day that Google decides, just for an hour or so, NOT to reply from 8.8.8.8 on ICMP requests, billions will loose their connection. Exactly the same billions that "just don't understand it". They will learn that day. The hard way. It will be Google's fault, of course .... -
@gertjan said in SG3100 - Frequent Internet Drops:
Still, strange.
A RFC1918 (10.0.0.0/8) as a WAN IPv4, for a fiber connection using an ONT.
Normally, you really have a RFC1918 WAN IP ?Just a quick test with our ONT, I think not the WAN IP is in the RFC1918 range, but the ISP DHCP server behind the ONT.
f.e.: this is how it works in Portugal the GPON (Altice Network - Nokia), this is very good implementation for me, because I get so many public IPs with the following solution, as many as I want...
(well, it's not redundant, but I can use multiple public IPs)IPoE VLAN12 on the ONT eth. port (this is default ISP VLAN, as it has VOIP and IPTV), I insert a Cisco SG350-10 with VLAN12 trunk directly into the eth. port of the ONT and with PVID I get different public IP on all switch ports (with PVID conf.)
In the DHCP log of the "test" WAN2 interface, you can see that the ISP's DHCP server in RFC1918 10.x.x.x
looks like this for me... (DHCPOFFER from / and PACK 10.x.x.x (pfSense returns here)
+++edit:
Oups left in a public IP, no problem it was just a test anyway
+++edit2:
I still think that the speed negotiation is causing the connection to fail and pfSense cannot return to the DHCP server behind ONT(it could be temperature-indicated on PHY or idiot too frequent TR069 restart)
-
@daddygo : Oops, you're right, asking a new lease from a DHCP server living on 10.a.b.c doesn't imply that the proposed WAN IP is also a 10.a.b.d.
-
I really appreciate all the help and suggestions from this thread.
I swapped the cable between the ONT & SG3100 on June 9 at 8:45PM and lost internet this morning at 9:50. Old cable was a 10+ year old CAT5 and new one is a CAT5E. Was hoping the simplest fix would work but oh well.
There is no "No Route to host" message in the DHCP logs this time, so maybe that was an unrelated issue previously. At 9:49AM, this appears in the Gateway logs:
Jun 11 09:49:23 dpinger 84081 WAN_DHCP 8.8.8.8: Alarm latency 31856us stddev 459us loss 21%
At 9:50AM I received a notice on my phone that my network does not appear connected to the internet. This is the lone entry in the System log from about 36 hours before to an hour after when I unplugged the WAN and plugged back in.
I do occasionally see in the logs:
Jun 11 11:44:56 dhclient 81802 DHCPREQUEST on mvneta2 to 255.255.255.255 port 67
and is followed by a DHCPACK response to my Gateway address (100.x.x.x) and then
Jun 11 11:44:56 dhclient 86546 RENEW Jun 11 11:44:56 dhclient 87064 Creating resolv.conf
So, all of this being said, I will continue trying the other recommendations you all have given me, unless this updated info suggests a different solution to you all. This weekend I will put a dumb switch between the ONT & WAN to see if that solves the issue. Unfortunately it seems the only way to test is to change something and wait 30ish hours.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
and wait 30ish hours.
Hmmmm....
there are ideas still on the list
We look forward to your return, good luck
-
@daddygo said in SG3100 - Frequent Internet Drops:
@jbgdev said in SG3100 - Frequent Internet Drops:
to try putting a dumb switch between the ONT & SG-3100.
yes this is used as a debugging tool, it points to exactly what I suggested for speed negotiation
+++edit:
the firewall device eth. PHY (chip, IC) is not fully compatible with the ISP ONT eth. port PHY and can only negotiate speed after reboot again or cable connect / disconnect(this is mostly the case with Realtek's cheaper PHY - in ONT)
and what may cause this problem is a nightly restart of ONT using TR069
I put a dumb switch between the ONT and router, internet dropped this morning around 7:30AM (roughly 37 hours after the change). Same errors in the log as before.
Based on this test and comments in this thread, does it seem like this is pointing back to the ISP and/or the ISP equipment?
-
@jbgdev said in SG3100 - Frequent Internet Drops:
does it seem like this is pointing back to the ISP and/or the ISP equipment?
Yes, it seems that way to me too, - maybe, if you have an old router, you can test it for a few days, -yes, until then you have to put pfSense aside for a bit of test time...
BTW:
This is good because at the first attempt the ISP will say that you the fault (problem) is in your device,....SG
This is the basic attitude of all ISPs, -
@daddygo I will swap an old router in this weekend. You are correct, as expected, the ISP told me the problem must be with my router.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
You are correct, as expected
Hmmmm
It's sad but true, yes exactly what I expected, most of the time they are not right and have no knowledge of the equipment which they did not provide.... (like SG)
So the answer is that your device is bad, but they expect you to pay the bill on time.
Speaking from experience, it can be a long fight, I hope not, let's see what an old router shows.
BTW:
The annoying thing is that all they had to do was look at the log file on their side and we'd be in the picture
-
@daddygo - Network has been running fine for about 5 days using a TP Link Archer 7 router.
I am on Metronet. I found another post in the forum of a Metronet user that resolved their problem by purchasing a Static IP, which is an additional fee per month. I'd prefer not to do that.
I also tried changing the speed negotiation to 1000baseT Full Duplex. When time came for the DHCP Lease renewal, the network went down and I could not get back online until I changed the setting back to automatic - i.e. unplugging/re-plugging the network cable did not get me back online like it usually did.
Are there any other avenues to go down to get the SG3100 working?
-
@jbgdev said in SG3100 - Frequent Internet Drops:
Are there any other avenues to go down to get the SG3100 working?
Hi,
Somewhere I had a feeling that this was going to be the output, so I thought let's do these tests. (old router)
I am sorry really, this nifty little Netgate box works well for many everyone, but really.
I can also confirm that the problem is definitely not SG compatibility, you ran into an issue where the ISP CPE device and SG eth. PHY will not work properly together, may be....(hmmm?)I have an idea, .....
to go around all the possibilities (but only if you feel like it):
-
What are you running now as pfSense version? (let's look at an older one? - test 1)
-
What do you see in the ARP table, what network device is the ISP using? (test 2)
and /or
MAC vendor: https://macvendors.com/
(I had the experience with DOCSIS modems that the ISP's terminal (DSLAM (ADSL), CMTS (DOCSIS), GPON ONT (FTTH), etc.) equipment preferred its own MAC address range /
I wouldn't go too deep into it, but for example a Cisco CMTS unit, prefers a Cisco WAN MAC address, my parents have a DOCSIS4 system like this - with spoofed Cisco MAC address)
They had very similar problems until as long as I did not set a fake (spoofed) Cisco MAC address.
This theme is getting more and more interesting, if you want to try, we can learn from it.
BTW:
I will give a brief explanation (yet I give):
because originally the provider gave to my parents a Cisco router as an ISP CPE and
I replaced it, to a pcEngines APU4 pfSense NGFW, well the service provider didn't like it and had constant and similar problems...the solution was to add the MAC address of the original Cisco router (from ISP) to the pfSense box WAN and voila it has been working for hundreds of years...
-
-
@daddygo I finally worked with Netgate Sales, which I suppose would have been a good place to start. I've been up and running for over a week now!
Their instructions:
**Under "Interfaces / WAN // DHCP Client Configuration", check "Advanced Configuration". Then on the "// Lease Requirements and Requests / Option modifiers" enter the following:
supersede dhcp-lease-time 3600
Reboot afterwards under "Diagnostics / Reboot"**
I wanted to be sure to share this back here in case anyone else experiences this issue in the future.
-
@jbgdev said in SG3100 - Frequent Internet Drops:
I finally worked with Netgate Sales, suppose would have been a good place to start.
Yup, I never doubted it
I think the forum is always a good start and it's also Netgate anyway, but I'm glad you did it on your own.
BTW:
I never doubted it @jbgdev " Advanced Configuration"
but I'm glad you did it on your own....DHCP fine-tune on WAN intf. f.e.: Protocol Timing:
As if someone had said it before....