SG-2100 dropping LAN connections with EM7305 Cellular WAN
Ok, I was able to test this but have been unable to replicate it. The connection stays up as does the switch ports.
[2.4.5-RELEASE][firstname.lastname@example.org]/root: usbconfig -d ugen0.2 dump_device_desc ugen0.2: <Sierra Wireless, Incorporated EM7305> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (500mA) bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0200 bDeviceClass = 0x0000 <Probed by interface class> bDeviceSubClass = 0x0000 bDeviceProtocol = 0x0000 bMaxPacketSize0 = 0x0040 idVendor = 0x1199 idProduct = 0x9041 bcdDevice = 0x0006 iManufacturer = 0x0001 <Sierra Wireless, Incorporated> iProduct = 0x0002 <EM7305> iSerialNumber = 0x0003 <> bNumConfigurations = 0x0002
ppp0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492 inet 18.104.22.168 --> 10.64.64.0 netmask 0xffffffff inet6 fe80::2e0:edff:feb6:1359%ppp0 prefixlen 64 scopeid 0xc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Even get reasonable speeds:
[2.4.5-RELEASE][email@example.com]/root: speedtest Retrieving speedtest.net configuration... Testing from Three (22.214.171.124)... Retrieving speedtest.net server list... Selecting best server based on ping... Hosted by GMCHosting LLC (London) [1.20 km]: 52.674 ms Testing download speed................................................................................ Download: 21.49 Mbit/s Testing upload speed...................................................................................................... Upload: 26.36 Mbit/s
Have you been able to check the etherswitchcfg output?
@stephenw10 Thanks for that Stephen.
Yes I have managed to do some testing inbetween answering front door to Amazon delivery guy who I am sure feels like he is on a bungy cord pegged to my doorstep.
here's what I have
power on SG-2100 with SIM installed.
Connect laptop to LAN3 and login @ 192.168.1.1 => WAN is up, NTP and DNS both connected over WAN, traverse dashboard until LAN3 locks up and no more dashboard access from browser.
Pull LAN3 cable => See Photo below of LAN3 port LEDs frozen with no cable attached.
- @ console => etherswitchcfg => see picture below. Why all of the ports showing "active" with HW loopback and 16 Link aggregation groups?
@ Console => exit shell => select 5 to reboot the SG-2100 => select "Y" => unit reboots (i DO NOT pwrcycle). => WAN connects and gets an IP address but LAN port lights do not flash and I am unabe to get an IP address when connecting cable to any LAN port.
@ Console => etherswitchcfg => the switch function doesn't appear to have come up as there is no device listed in /dev .
@ Console => shutdown -p now => After unit has shutdown I disconnect power cable => wait 15 secs => reconnect pwr and reboot.
@ Console => etherswitchcfg => Note that all LAN ports 1-4 show no carrier as no cables are attached. Also note there are no Lagg's listed.
Inserting a cable into LAN port 3 serves laptop with an IP address and I can access GUI from a browser.
@ Console => etherswitchcfg => now lists LAN3 as active as to be expected.
Do Nothing => after 8 mins, lights on LAN3 have gone out, no connectivity from laptop to GUI via browser and "etherswitchcfg" command at console gives the following => all the lagg groups are back and all LAN ports are active in loopback only.
Clearly something not right. Might very well be the EM7305 card as that is the thing I have that you don't. I took delivery of an EM7455 this morning and plan to test. Will report back how that goes after testing it, but unlikely to be before end of weekend.
Have a great Christmas break and thanks for helping out. We can pick up again next week, or whenever you're free.
Hmm, yeah that's very wrong!
I was testing with an identical card, EM7305. The one thing that is definitely different are the antennas I'm using and the positions I'm using them in.
Hard to imagine that makes any difference but I guess it could potentially.
You have your antenna in the logical place. I'm about 2mm shy of being able to connect to the "Main" interface from the socket next to USB port using with the pig tails I purchased from ebay.
DIdn't want to "encourage them" to lean more toward the WWAN card as it felt quite rigid when I gave it a "nudge".
Not sure that the location of main antenna above LAN1 would impact decision of ethernet switch funciton to not come up in the SoC. I am more thinking that it is an interrupt thing from WWAN card that is causing switch functionality to lock. Hopefully the new WWAN card will help move forward in understanding where the problem lies.
Mmmm, I agree the antenna location seems unlikely to make a difference there.
It looks like the switch IC is 'hanging' for want of a better term. It should run entirely independently of the CPU once the ports are set as forwarding which they clearly are.
I suspect that output from etherswitchcfg you're seeing is completely bogus. It's probably reading all 1s or all 0s from the switch registers because it's not actually responding at all. If you tried to set something on the switch I imagine it would fail.
I've not seen that happen before on anything though. I have no explanation for why it would only happen with the modem in action.
OK, so I have done some more testing.
- Tested with another modem module and LAN ports are still dropping.
- Q: What's the latest recommended BIOS version for SG-2100?
- Q: Can I switch out my new SG-2100 for another one or SG-3100?
Detailed Summary of Testing
Re-profiled a DW5811e to a Generic EM7455 using the script and guidance available on this site.
=> Created a Ubuntu 18.04.5 LiveUSB
=> mounted modem in recommended external USB adapter
=> used bash script to reprofile module to be a Generic EM7455.
(FYI, Small glitch in the script required a minor edit for it to work for me .... see ** end of this post for details).
=> Running above script with "-s" option runs a list of AT commands to probe modem settings........ Output shows as follows. (I have edited out MEID/IMEI/FSN)
From my limited experience with LTE modules, it looks to be configured OK to work as a generic EM7455 card where AT commands should work for a PPP connection,.... but would welcome comments to the contrary.
Module was removed from external adapter, mounted in the SG-2100 and booted up, but no WAN IP address was issued.
It was rebooted multiple times over several hours of reading/testing to try and find anything I had missed in converting from a DW5811e to a Generic EM7455,... but it all looked good.
Also checked the following.
Are IPEX connectors mated firmly to EM7455? => Yes
Is the SIM plugged in? => Yes.
Is the AT port working? => Believe so,....@ pfsense dashboard => Status => System Logs => PPP log showed that the "AT" CHAT was working OK.
Is the right APN being issued for the ISP?
=> ...status page showed that modem was issuing correct APN via +CDGCONT command in the CHAT
=> authorization reported as "successful"
=> Link reported as "UP" and assigned to WAN
=> IPCP would attempt to negotiate IP address but never received one from the cell network
=> With peer not responding to echo requests the link is eventually closed and the cycle starts again.
This post described a similar situation that was overcome by checking the pfsense config file had a country path specified in the PPP section of the config.xml file........ I looked at that in the SG-2100 but mine had a country entry and it still wasn't working for me.
...... on the plus side, the LAN ports didn't crash while doing this analysis which is my original issue, but of little benefit with no WAN.
Move Module back to external adapter
With everything looking OK but the WAN not getting an IP address on the SG-2100 after multiple power cycle reboots, I decided to move the EM7455 back to external USB adapter, add the SIM and see if I could get an internet connection on the ubuntu laptop to at least confirm that the module was working.
=> Ubuntu ModemManager => select EM7455 => edit => Add connection => OK => Cellular connection available => ........internet connection was achieved
=> was able to do multiple Ookla speedtests giving 15Mbps down and 1.2Mbps up ...... (Rubbish performance I know, but hey,..... I'll take it!)
So the card works as an external USB modem but I suspect it uses MBIM interface with ubuntu which I believe is still not supported/available in FreeBSD, ......so two possible reasons it's not working
still something else missing in the module config for PPP mode to work on SG-2100 as an internal modem
something wrong with my SG-2100 HW/FW).
Now it Works?!
I moved it back to the SG-2100 and rebooted with the intention of writing AT commands to the module, ...... but surprisingly I now get a WAN IP address and internet access without doing anything further. Strange!..... Maybe the cellular network registered the WWAN/SIM combination when it connected as an external modem under ubuntu and reduced the level of authentication or reconnection steps required to reconnect the same WWAN/SIM combination when installed as an internal modem on FreeBSD in the SG-2100??........ (am just guessing out loud here. Do not like things fixing themselves without understanding root cause,...... any ideas anyone?).
Now it doesn't
So with the EM7455 now getting an IPV4 address and connecting to the internet,...I got to do 2.5 Ookla speed tests ........and then the LAN ports froze, same behavior as with the EM7305 module.
Given that the LAN ports "freezing" is now an issue exhibited with two separate WWAN modules I am leaning more toward the issue being with the SG-2100 unit.
What BIOS version should it be?
Steven, your SG-2100 is working OK with EM7305 .....Q: What version of BIOS/Firmware is it running? I am assuming "latest production", but you may have something more advanced for testing?
My SG-2100 BIOS/Firmware details are :-
- BIOS: U-Boot 2018.03-devel-1.2.0Rogue2-01.00.00.02+ Feb 7th 2020
- pfsense 2.4.5p1
- FreeBSD 11.3-STABLE
I have assumed mine is the latest production, what with it being a brand new unit, but to be honest I am struggling to find what the latest firmware version number should be from the Netgate website.
My experience in trying to update Firmware is to find the hardware product on the vendor website and look for a software or support download page, then hunt through the listing for the most recent BIOS to download. I have not been able to find any reference to downloading BIOS software on the Netgate site, or in the SG-2100 online documentation. Can someone gimme a link if I missed it. Thanks ....... or.......
Q: Is the U-BOOT Bios rolled into the "special" version of pfsense firmware for Netgate hardware that needs to be requested from the support team?
.......If yes, it woud be good to add this detail to the SG-2100 online documentation to let customers know that there is no publicly available BIOS download so it's pointless searching for it. Also it would be good to have a listing of the recommended U-BOOT BIOS version number somewhere on the SG-2100 product page. The old SG products using CoreBoot had a package for CoreBoot Bios upgrades which made me think that the BIOS is independant of the pfsense firmware, but the SG-2100 doesn't appear to have a package to update U-BOOT BIOS. Are there any plans for a package to update the BIOS?
So,... that is where I am at. I have have two WWAN modules that work OK in an external USB adapter with ubuntu Live Linux, but when mounted in the SG-2100, both can now get a cellular connection but this causes the ethernet switch to crash/freeze and I lose all LAN port connectivity.
I am looking for an integrated solution and would like to resolve this issue with the SG-2100,.... but I would accept an SG-3100 if it can solve the problem.
** Small edit to Bash script updating EM7455 firmware
At the start of this latest post I referenced the need for a small edit to the bash script that updates firmware in EM7455.
The script "autoflash-7455.sh" downloaded from here, contains a function called " flash_modem_firmware" on line 357, which invokes a perl script to do what the title says. The argument "--upload-download" on the command line was not being recognized and I had to change to "--upload-qdl" in order to successfully flash with new firmware. I dropped a note to author to let him know.
I would expect an EM7455 to work. I have not actually tested that in the SG-2100 but I have tested it in an SG-3100 and it works fine there.
I believe I may have replicated the LAN port issue here it just took a lot longer. I have an SFP WAN connected which may be influencing that.
One of our developers is attempting to replicate it now.
Thanks for getting back to me so quick on a Sunday..
Q: What version of U-Boot BIOS is your SG-2100 running? (Just want to confirm I am on the right version)
Q: Is there a separate BIOS for SG-2100 or is it rolled into the "special" pfsense builds that customers need to request for Netgate hardware?
Was going to call your distributor tomorrow to see if I can switch out my SG unit for a new one and test further. Is that a pointless exercise given there is something being explored by devs?
Thanks for your support
There is only one uboot version that should ever have been on a prodution device as far as I know:
Vendor: U-Boot Version: 2018.03-devel-1.2.0ROGUE2-01.00.00.02+ Release Date: Fri Feb 7 2020
There were some earlier versions on prototypes etc and may be updates at some point but there are not currently.
If there was an update it would likely be rolled into a pfSense update. If that was not practical it might be released as a package but that's just speculation.
Given that I was able to replicate it I would hold on swapping it out until we have some better idea of the cause.
Did you try running the EM7305 in the external enclosure? Did the switch still fail?
Running EM7305 in external M.2/USB adapter connected to SG-2100 appears to be working without causing etherswitch to freeze. Well,... at least I am comfortably past 10 mins without it freezing which is good. Will update later this afternoon to confirm that its a workable temporary fix, but I would like an integrated solution for deployment.
Can you elaborate on work Devs are looking at and potential timeline for feedback on whether its likely to be fixed within 2 weeks?
I don't really have anything further at this point. I doubt there will be any updates here within two weeks. Even if the cause is found quickly a solution would be near impossible in that time frame.
Any further data points here can only help though.
Ok thanks for clarification Steve. .... so looks like we are talking months for a fix then. Weeks to get confirmation of an issue/fix schedule and then months to roll out in an official update? Is 6 months to resolution a reasonable assumption?
Couple of weeks ago you posted the following ... " I tested it with an em7345, though that is a different connection type".
Can you expand on that comment please.
Does the EM7345 not use PPP as a connection?
If I was to get a EM7345 card is it likely I can achieve a working integrated solution with the SG-2100 that I have or is it likely to be impacted by the issue being investigated?
The EM7345 card appears as a USB Ethernet device with a separate AT port for connecting only. Potentially giving greater throughput.
Dec 7 20:31:08 kernel ugen0.2: <Sierra Wireless Inc. Sierra Wireless EM7345 4G LTE> at usbus0 Dec 7 20:31:08 kernel cdce0 on uhub0 Dec 7 20:31:08 kernel cdce0: <Sierra Wireless EM7345 4G LTE NCM> on usbus0 Dec 7 20:31:08 kernel umodem0 on uhub0 Dec 7 20:31:08 kernel ue0: <USB Ethernet> on cdce0 Dec 7 20:31:08 kernel ue0: Ethernet address: ff:ff:ff:ff:ff:ff Dec 7 20:31:08 kernel umodem0: <Sierra Wireless EM7345 4G LTE> on usbus0 Dec 7 20:31:08 kernel umodem0: data interface 3, has no CM over data, has break
However I have no reason to think it would behave any differently with the switch.
The EM7305 did not show that behaviour initially either, it took some hours. So it's entirely possible the EM7345 would also have done given long enough. It isn't substantially different.
Similar power draw, similar emissions etc. You saw the same thing with an EM7455?
With regards to the question in your last sentence, .... yes, the switch locked up both with EM7305 and EM7455. For me, it locks up within 10 mins with both modules.
Ok, good to know. I'll add that to our internal report.
Do you know what LTE Band(s) you were using when you saw this?
We have been unable to replicate this on one device but it may be using different LTE bands, we are investigating.
Hi Steve Apologies for delay in responding.
Interesting comment. I am on B3 (with Three UK).
If you saw an issue (albeit it took longer than me) and we're on same freq/carrier, then it's an avenue to explore.
If US dev's are not seeing anything, could be because they are not waiting long enough, or because the issue is freq band dependent (long shot but worth exploring with Sierra Wireless).
Using AT!GSTATUS? command I can see that carrier aggregation is not set up. Was it set up with your EM7455 to get the rates you reported earlier in the thread?
Ah, same as me. Also Three in the UK. Though it's hard to tell it seems to move about a bit when the ppp link is not up and you can't interrogate it when it is.
I never set up anything additionally so if aggregation is nor enabled by default I wouldn't have been using it.
I'm not sure which speeds you're referring to but they would have been with the 7305 almost certainly.
It's interesting when I first tested this it took some hours to trigger the switch failure. I reset it and it triggered again overnight. However now it has been up for days and seems OK.
It's definitely on the edge of whatever it is.
I think CA is allocated by tower based on bands you allow on the client. I believe that if your modem is set to band group 09 (LTE ALL), then the tower cell should enable carrier aggregation based on other metrics that are dynamic and ISP dependent. (...... but would welcome being corrected if this is not correct).
with AT!GSTATUS? command I see that my LTE CA state is "NOT ASSIGNED" I also see a bunch of signal quality and power measurements which might be useful to share.
I suspect mine are worse that yours and would be interested to note what you see.
RSRP = -118 dBm [ (-44dBm = good) <=> (-140dBm=Bad)]
RSRQ = -14.5 dB [ (-3dB=good) <=> (-19.5dB=Bad)]
SINR = 2 dB
RSRP = average received power from ref signal
RSRQ = indication of received signal quality
SINR = Signal to Noise Ratio
Overall I have felt that I am on outer edge of cell coverage and it is reflected in signal metrics above (UL/DL speeds and link metrics leaning to "Bad" end of scale). Given your higher data rates from EM7305 tests, I would be interested to know what your power figures are when seeing performance similar to what you reported in thread above ( 21.5Mbps DL / 26.3Mbps UL).
If your wireless performance is much better than mine reported above, it hints that "maybe" there is some correlation between quality of wireless link and stability of SoC firmware controlling embedded switch. ..... yeah a stretch I know, but thats where I am at the moment as I cannot use SG2100 with an integrated modem like I bought it for.
Is it worth trying to switch it out for a 3100 to test it?
I'm in London so signal strength is usually good. I've made no real effort to position it for better signal and it's surrounded by other firewalls etc. I see (last week):
!GSTATUS: Current Time: 1550848 Temperature: 44 Bootup Time: 0 Mode: ONLINE System mode: LTE PS state: Attached LTE band: B3 LTE bw: 15 MHz LTE Rx chan: 1392 LTE Tx chan: 19392 EMM state: Registered Normal Service RRC state: RRC Connected IMS reg state: No Srv RSSI (dBm): -71 Tx Power: 0 RSRP (dBm): -95 TAC: 048C (1164) RSRQ (dB): -7 Cell ID: 000E1F02 (925442) SINR (dB): 14.2
That's the EM7305 in an SG-2100 with random antennas!
I don't see anything listing CA state.