SG-2100 dropping LAN connections with EM7305 Cellular WAN
-
Hi All
I have an issue on a new SG-2100.......all LAN ports are disconnecting when using an EM7305 LTE card for WAN interface.
Problem only occurs when the modem is connected to the cellular network.
If the unit is booted with no SIM, or the WAN has a wired connection (DHCP or PPPoE), access to dashboard GUI thru LAN ports works as expected without disconnecting.Thanks in advance for anyone that takes the time to respond and make suggestions on what I could do next. Appreciate it.
Setup
-
New SG-2100 + Sierra Wireless EM7305 installed + paddle antenna fitted
-
Boot pfsense with default factory config => login to dashboard @ 192.168.1.1
-
Interfaces => Assignments => PPPs => Add => (Link Type=PPP + Link Interface =/dev/cuaU0.2 + country/vendor/plan/Ph# settings) => Save
-
Interfaces => Assignments => click WAN interface => IPv4 Configuration Type=PPP => (complete PPP Config section again) => save
-
Shutdown => insert SIM => powerup
Behaviour
-
Main dashboard shows cellular WAN interface is up and IPv4 address has been assigned by ISP.
-
NTP syncs with pool servers and acquires an active peer. Names are resolved to IP addresses at (Diagnostics => DNS Lookup)
-
I can continue working with the GUI to customize options for some minutes (often single digit), and then wired connection from laptop to SG-2100 LAN port is dropped. The Laptop falls back to wifi connection and the Egress/Ingress LEDs freeze on the SG-2100 LAN1 network port. I can disconnect ethernet cable from SG-2100 and LEDs remain in their frozen state. (I have done this enough times to see the LEDs freeze in all 4 possible combinations).
Time taken for the link to drop is random but often less than 10 minutes. I've had one instance where it stayed up for 23 minutes and was able to stream youtube clips and complete a couple of Ookla speedtests over the cellular WAN before the LAN link dropped.
Debug actions
-
Disconnect / reconnect LAN ethernet cable?
Once the connection drops there is no ethernet communication between laptop and router. Pulling cable / waiting / reinserting cable to any of the LAN ports 1-4 does not re-establish a connection and laptop sits waiting to be served an IP address. -
ifconfig down/up on the laptop?
No new IP address issued. No connectivity -
Use console to restart/reboot?
I have console access from another PC using PuTTy and can reboot / restart the SG-2100 without removing power cable, but that doesn't recover communication through LAN ports. -
Power cycle the SG-2100?
Doing a command line shutdown and then powercycling the unit by pulling out/reinserting the power connector does reset the LAN port to active and I get browser access to the GUI.
I have used multiple ethernet cables, 2 separate laptops and a tower PC. All give same results as described above so I am inclined to think the problem lies with SG-2100 + EM7305 WWAN card.
I have probed the configuration of WWAN card using AT commands (shown below). It all looks similar to other configurations I have seen on various forums and clearly the modem is working well enough to get connectivity with the cell tower, but I am not familiar enough to know if there is a misconfiguration that could cause the SG-2100 switch function to appear to crash while the remainder of the interfaces appear to be working.
Am wondering whether the combination of old WWAN card (EM7305 went EoL in May 2019) + new product introduction (SG-2100 Sept 2020) has exposed a firmware corner case that locks up the SoC LAN ports and requires a power on reset to recover?
I have seen some forum comments where EM7305 appears to be working OK in an SG-3100 and assumed that an SG-2100 would be OK as well, but would welcome comments from anyone who can speak to the HW/FW similarities between the two products and confirm that my assumption is reasonable or whether the 2100 could have an issue with an EM7305 where the 3100 does not.
My Probing with AT Commands
@ Console => Option 8 (pfsense shell) => cu -l /dev/cuaU0.2![1-
at!entercnd="A710" (unlock 'AT@ commands)
at!udusbcomp=? (show compositions supported by WWAN card)
at!udusbcomp? (show current composition)
at!gobiimpref? (show FW/carrier/configs available/current)
at!udinfo? (Modem card details)
at!pcinfo?
-
-
Hmm, I would certainly expect that to work. I tested it with an em7345, though that is a different connection type.
The behaviour you describe with the port LEDs remaining lit even when you pull the cable could only really be the switch IC.
Since you're still able to access the console try running
etherswitchcfg
, does the switch respond.To be clear this doesn't happen if the WAN port is connected to a valid connection even if the LTE connection is up with traffic going across it?
I have an em7305 I should be able to test this.
Steve
-
Ok, I was able to test this but have been unable to replicate it. The connection stays up as does the switch ports.
[2.4.5-RELEASE][admin@2100-2.stevew.lan]/root: usbconfig -d ugen0.2 dump_device_desc ugen0.2: <Sierra Wireless, Incorporated EM7305> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (500mA) bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0200 bDeviceClass = 0x0000 <Probed by interface class> bDeviceSubClass = 0x0000 bDeviceProtocol = 0x0000 bMaxPacketSize0 = 0x0040 idVendor = 0x1199 idProduct = 0x9041 bcdDevice = 0x0006 iManufacturer = 0x0001 <Sierra Wireless, Incorporated> iProduct = 0x0002 <EM7305> iSerialNumber = 0x0003 <> bNumConfigurations = 0x0002
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492 inet 92.41.185.5 --> 10.64.64.0 netmask 0xffffffff inet6 fe80::2e0:edff:feb6:1359%ppp0 prefixlen 64 scopeid 0xc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Even get reasonable speeds:
[2.4.5-RELEASE][admin@2100-2.stevew.lan]/root: speedtest Retrieving speedtest.net configuration... Testing from Three (92.41.185.5)... Retrieving speedtest.net server list... Selecting best server based on ping... Hosted by GMCHosting LLC (London) [1.20 km]: 52.674 ms Testing download speed................................................................................ Download: 21.49 Mbit/s Testing upload speed...................................................................................................... Upload: 26.36 Mbit/s
Have you been able to check the etherswitchcfg output?
Steve
-
@stephenw10 Thanks for that Stephen.
Yes I have managed to do some testing inbetween answering front door to Amazon delivery guy who I am sure feels like he is on a bungy cord pegged to my doorstep.
here's what I have
-
power on SG-2100 with SIM installed.
-
Connect laptop to LAN3 and login @ 192.168.1.1 => WAN is up, NTP and DNS both connected over WAN, traverse dashboard until LAN3 locks up and no more dashboard access from browser.
-
Pull LAN3 cable => See Photo below of LAN3 port LEDs frozen with no cable attached.
- @ console => etherswitchcfg => see picture below. Why all of the ports showing "active" with HW loopback and 16 Link aggregation groups?
-
@ Console => exit shell => select 5 to reboot the SG-2100 => select "Y" => unit reboots (i DO NOT pwrcycle). => WAN connects and gets an IP address but LAN port lights do not flash and I am unabe to get an IP address when connecting cable to any LAN port.
-
@ Console => etherswitchcfg => the switch function doesn't appear to have come up as there is no device listed in /dev .
-
@ Console => shutdown -p now => After unit has shutdown I disconnect power cable => wait 15 secs => reconnect pwr and reboot.
-
@ Console => etherswitchcfg => Note that all LAN ports 1-4 show no carrier as no cables are attached. Also note there are no Lagg's listed.
-
Inserting a cable into LAN port 3 serves laptop with an IP address and I can access GUI from a browser.
-
@ Console => etherswitchcfg => now lists LAN3 as active as to be expected.
-
Do Nothing => after 8 mins, lights on LAN3 have gone out, no connectivity from laptop to GUI via browser and "etherswitchcfg" command at console gives the following => all the lagg groups are back and all LAN ports are active in loopback only.
-
Clearly something not right. Might very well be the EM7305 card as that is the thing I have that you don't. I took delivery of an EM7455 this morning and plan to test. Will report back how that goes after testing it, but unlikely to be before end of weekend.
-
Have a great Christmas break and thanks for helping out. We can pick up again next week, or whenever you're free.
-
-
Hmm, yeah that's very wrong!
I was testing with an identical card, EM7305. The one thing that is definitely different are the antennas I'm using and the positions I'm using them in.
Hard to imagine that makes any difference but I guess it could potentially.Steve
-
Thanks Steve
You have your antenna in the logical place. I'm about 2mm shy of being able to connect to the "Main" interface from the socket next to USB port using with the pig tails I purchased from ebay.
DIdn't want to "encourage them" to lean more toward the WWAN card as it felt quite rigid when I gave it a "nudge".
Not sure that the location of main antenna above LAN1 would impact decision of ethernet switch funciton to not come up in the SoC. I am more thinking that it is an interrupt thing from WWAN card that is causing switch functionality to lock. Hopefully the new WWAN card will help move forward in understanding where the problem lies.
Andy -
Mmmm, I agree the antenna location seems unlikely to make a difference there.
It looks like the switch IC is 'hanging' for want of a better term. It should run entirely independently of the CPU once the ports are set as forwarding which they clearly are.
I suspect that output from etherswitchcfg you're seeing is completely bogus. It's probably reading all 1s or all 0s from the switch registers because it's not actually responding at all. If you tried to set something on the switch I imagine it would fail.
I've not seen that happen before on anything though. I have no explanation for why it would only happen with the modem in action.Steve
-
OK, so I have done some more testing.
TL:DR
- Tested with another modem module and LAN ports are still dropping.
- Q: What's the latest recommended BIOS version for SG-2100?
- Q: Can I switch out my new SG-2100 for another one or SG-3100?
--------------------------------
Detailed Summary of Testing
Re-profiled a DW5811e to a Generic EM7455 using the script and guidance available on this site.-
=> Created a Ubuntu 18.04.5 LiveUSB
-
=> mounted modem in recommended external USB adapter
-
=> used bash script to reprofile module to be a Generic EM7455.
(FYI, Small glitch in the script required a minor edit for it to work for me .... see ** end of this post for details). -
=> Running above script with "-s" option runs a list of AT commands to probe modem settings........ Output shows as follows. (I have edited out MEID/IMEI/FSN)
From my limited experience with LTE modules, it looks to be configured OK to work as a generic EM7455 card where AT commands should work for a PPP connection,.... but would welcome comments to the contrary.
Module was removed from external adapter, mounted in the SG-2100 and booted up, but no WAN IP address was issued.
It was rebooted multiple times over several hours of reading/testing to try and find anything I had missed in converting from a DW5811e to a Generic EM7455,... but it all looked good.
Also checked the following.
-
Are IPEX connectors mated firmly to EM7455? => Yes
-
Is the SIM plugged in? => Yes.
-
Is the AT port working? => Believe so,....@ pfsense dashboard => Status => System Logs => PPP log showed that the "AT" CHAT was working OK.
-
Is the right APN being issued for the ISP?
=> ...status page showed that modem was issuing correct APN via +CDGCONT command in the CHAT
=> authorization reported as "successful"
=> Link reported as "UP" and assigned to WAN
=> IPCP would attempt to negotiate IP address but never received one from the cell network
=> With peer not responding to echo requests the link is eventually closed and the cycle starts again.
This post described a similar situation that was overcome by checking the pfsense config file had a country path specified in the PPP section of the config.xml file........ I looked at that in the SG-2100 but mine had a country entry and it still wasn't working for me.
...... on the plus side, the LAN ports didn't crash while doing this analysis which is my original issue, but of little benefit with no WAN.
Move Module back to external adapter
With everything looking OK but the WAN not getting an IP address on the SG-2100 after multiple power cycle reboots, I decided to move the EM7455 back to external USB adapter, add the SIM and see if I could get an internet connection on the ubuntu laptop to at least confirm that the module was working.-
=> Ubuntu ModemManager => select EM7455 => edit => Add connection => OK => Cellular connection available => ........internet connection was achieved
-
=> was able to do multiple Ookla speedtests giving 15Mbps down and 1.2Mbps up ...... (Rubbish performance I know, but hey,..... I'll take it!)
So the card works as an external USB modem but I suspect it uses MBIM interface with ubuntu which I believe is still not supported/available in FreeBSD, ......so two possible reasons it's not working
-
still something else missing in the module config for PPP mode to work on SG-2100 as an internal modem
-
something wrong with my SG-2100 HW/FW).
Now it Works?!
I moved it back to the SG-2100 and rebooted with the intention of writing AT commands to the module, ...... but surprisingly I now get a WAN IP address and internet access without doing anything further. Strange!..... Maybe the cellular network registered the WWAN/SIM combination when it connected as an external modem under ubuntu and reduced the level of authentication or reconnection steps required to reconnect the same WWAN/SIM combination when installed as an internal modem on FreeBSD in the SG-2100??........ (am just guessing out loud here. Do not like things fixing themselves without understanding root cause,...... any ideas anyone?).Now it doesn't
So with the EM7455 now getting an IPV4 address and connecting to the internet,...I got to do 2.5 Ookla speed tests ........and then the LAN ports froze, same behavior as with the EM7305 module.Given that the LAN ports "freezing" is now an issue exhibited with two separate WWAN modules I am leaning more toward the issue being with the SG-2100 unit.
What BIOS version should it be?
Steven, your SG-2100 is working OK with EM7305 .....Q: What version of BIOS/Firmware is it running? I am assuming "latest production", but you may have something more advanced for testing?My SG-2100 BIOS/Firmware details are :-
- BIOS: U-Boot 2018.03-devel-1.2.0Rogue2-01.00.00.02+ Feb 7th 2020
- pfsense 2.4.5p1
- FreeBSD 11.3-STABLE
I have assumed mine is the latest production, what with it being a brand new unit, but to be honest I am struggling to find what the latest firmware version number should be from the Netgate website.
My experience in trying to update Firmware is to find the hardware product on the vendor website and look for a software or support download page, then hunt through the listing for the most recent BIOS to download. I have not been able to find any reference to downloading BIOS software on the Netgate site, or in the SG-2100 online documentation. Can someone gimme a link if I missed it. Thanks ....... or.......
Q: Is the U-BOOT Bios rolled into the "special" version of pfsense firmware for Netgate hardware that needs to be requested from the support team?
.......If yes, it woud be good to add this detail to the SG-2100 online documentation to let customers know that there is no publicly available BIOS download so it's pointless searching for it. Also it would be good to have a listing of the recommended U-BOOT BIOS version number somewhere on the SG-2100 product page. The old SG products using CoreBoot had a package for CoreBoot Bios upgrades which made me think that the BIOS is independant of the pfsense firmware, but the SG-2100 doesn't appear to have a package to update U-BOOT BIOS. Are there any plans for a package to update the BIOS?
So,... that is where I am at. I have have two WWAN modules that work OK in an external USB adapter with ubuntu Live Linux, but when mounted in the SG-2100, both can now get a cellular connection but this causes the ethernet switch to crash/freeze and I lose all LAN port connectivity.
I am looking for an integrated solution and would like to resolve this issue with the SG-2100,.... but I would accept an SG-3100 if it can solve the problem.
** Small edit to Bash script updating EM7455 firmware
At the start of this latest post I referenced the need for a small edit to the bash script that updates firmware in EM7455.The script "autoflash-7455.sh" downloaded from here, contains a function called " flash_modem_firmware" on line 357, which invokes a perl script to do what the title says. The argument "--upload-download" on the command line was not being recognized and I had to change to "--upload-qdl" in order to successfully flash with new firmware. I dropped a note to author to let him know.
-
I would expect an EM7455 to work. I have not actually tested that in the SG-2100 but I have tested it in an SG-3100 and it works fine there.
I believe I may have replicated the LAN port issue here it just took a lot longer. I have an SFP WAN connected which may be influencing that.
One of our developers is attempting to replicate it now.
Steve
-
@stephenw10
Thanks for getting back to me so quick on a Sunday..Q: What version of U-Boot BIOS is your SG-2100 running? (Just want to confirm I am on the right version)
Q: Is there a separate BIOS for SG-2100 or is it rolled into the "special" pfsense builds that customers need to request for Netgate hardware?
Was going to call your distributor tomorrow to see if I can switch out my SG unit for a new one and test further. Is that a pointless exercise given there is something being explored by devs?
Thanks for your support
Andy
-
There is only one uboot version that should ever have been on a prodution device as far as I know:
Vendor: U-Boot Version: 2018.03-devel-1.2.0ROGUE2-01.00.00.02+ Release Date: Fri Feb 7 2020
There were some earlier versions on prototypes etc and may be updates at some point but there are not currently.
If there was an update it would likely be rolled into a pfSense update. If that was not practical it might be released as a package but that's just speculation.Given that I was able to replicate it I would hold on swapping it out until we have some better idea of the cause.
Did you try running the EM7305 in the external enclosure? Did the switch still fail?
Steve
-
Running EM7305 in external M.2/USB adapter connected to SG-2100 appears to be working without causing etherswitch to freeze. Well,... at least I am comfortably past 10 mins without it freezing which is good. Will update later this afternoon to confirm that its a workable temporary fix, but I would like an integrated solution for deployment.
Can you elaborate on work Devs are looking at and potential timeline for feedback on whether its likely to be fixed within 2 weeks?
Thanks Steve
-
I don't really have anything further at this point. I doubt there will be any updates here within two weeks. Even if the cause is found quickly a solution would be near impossible in that time frame.
Any further data points here can only help though.
Steve
-
Ok thanks for clarification Steve. .... so looks like we are talking months for a fix then. Weeks to get confirmation of an issue/fix schedule and then months to roll out in an official update? Is 6 months to resolution a reasonable assumption?
Couple of weeks ago you posted the following ... " I tested it with an em7345, though that is a different connection type".
Can you expand on that comment please.
-
Does the EM7345 not use PPP as a connection?
-
If I was to get a EM7345 card is it likely I can achieve a working integrated solution with the SG-2100 that I have or is it likely to be impacted by the issue being investigated?
Thanks Andy
-
-
The EM7345 card appears as a USB Ethernet device with a separate AT port for connecting only. Potentially giving greater throughput.
Dec 7 20:31:08 kernel ugen0.2: <Sierra Wireless Inc. Sierra Wireless EM7345 4G LTE> at usbus0 Dec 7 20:31:08 kernel cdce0 on uhub0 Dec 7 20:31:08 kernel cdce0: <Sierra Wireless EM7345 4G LTE NCM> on usbus0 Dec 7 20:31:08 kernel umodem0 on uhub0 Dec 7 20:31:08 kernel ue0: <USB Ethernet> on cdce0 Dec 7 20:31:08 kernel ue0: Ethernet address: ff:ff:ff:ff:ff:ff Dec 7 20:31:08 kernel umodem0: <Sierra Wireless EM7345 4G LTE> on usbus0 Dec 7 20:31:08 kernel umodem0: data interface 3, has no CM over data, has break
However I have no reason to think it would behave any differently with the switch.
The EM7305 did not show that behaviour initially either, it took some hours. So it's entirely possible the EM7345 would also have done given long enough. It isn't substantially different.
Similar power draw, similar emissions etc. You saw the same thing with an EM7455?Steve
-
OK Thanks.
With regards to the question in your last sentence, .... yes, the switch locked up both with EM7305 and EM7455. For me, it locks up within 10 mins with both modules.
-
Ok, good to know. I'll add that to our internal report.
Steve
-
Do you know what LTE Band(s) you were using when you saw this?
We have been unable to replicate this on one device but it may be using different LTE bands, we are investigating.
Steve
-
Hi Steve Apologies for delay in responding.
Interesting comment. I am on B3 (with Three UK).
If you saw an issue (albeit it took longer than me) and we're on same freq/carrier, then it's an avenue to explore.
If US dev's are not seeing anything, could be because they are not waiting long enough, or because the issue is freq band dependent (long shot but worth exploring with Sierra Wireless).
Using AT!GSTATUS? command I can see that carrier aggregation is not set up. Was it set up with your EM7455 to get the rates you reported earlier in the thread?
Thanks Andy
-
Ah, same as me. Also Three in the UK. Though it's hard to tell it seems to move about a bit when the ppp link is not up and you can't interrogate it when it is.
I never set up anything additionally so if aggregation is nor enabled by default I wouldn't have been using it.
I'm not sure which speeds you're referring to but they would have been with the 7305 almost certainly.It's interesting when I first tested this it took some hours to trigger the switch failure. I reset it and it triggered again overnight. However now it has been up for days and seems OK.
It's definitely on the edge of whatever it is.Steve