SG-1100 Won’t Reboot on Upgrade - no internet access!
-
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
@TangoOversway there is this?
https://docs.netgate.com/pfsense/en/latest/backup/restore-during-install.html#restore-configuration-from-media-during-installI was using that. I had moved my config file to my restore USB and imported it during the install process. I made one mistake, since I assumed it would see my backup config file, and didn't realize it had to be specifically named "config.xml". The filename pattern the backup feature uses will not be recognized. But my issue came later, after install and reboot. It had issues with interfaces that, apparently, shouldn't have been there (and weren't in my config file). So for some reason, it had a problem with a legitimate config file that came from a pfSense backup and was later used successfully to restore my new install to my configuration.
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
There are various threads about eMMC storage. TL, DR:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html#emmc
Packages that require/recommend SSD:
https://www.netgate.com/supported-pfsense-plus-packagesCome to think of it, it didn't occur to me that some packages won't work without the SSD. I do see that my device is notably slower in booting and in the web UI with it running from the USB stick, but I expected that, since a USB is always slower. (What didn't occur to me, and that I've never tested, is to use a microSD card on a USB adaptor and see if that's any faster than a USB stick. I don't know if it's a USB bottleneck or if the issue is a different type of memory in the device.)
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Do you mean where you reported seeing?: mountroot> random: unblocking device.
Yes. For the user, there's no indication that it can take time to do the unblocking, so it looks like it might have just frozen. And, as I mentioned, in my experience, a single keypress stopped the command. If the keypress didn't stop the command, then the command encountered an error and did not report it until my keypress. Either way, I, as the user, was not well informed on what was happening in that situation.
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
And did it just repeat random: unblocking device for some time?
It didn't do it one time after another during the same attempt, but in repeated reboots. I was thinking there were other issues, but your next comment I quote, below, responds to some of that.
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
That shouldn't happen in a normal boot. It may have to wait a second or two for the root drive to become available but that's it. If you see the mountroot> prompt that means it's tried to mount root and failed leaving you at the prompt. Some other background process is spamming the unblocking device message but effectively it is still waiting for input at the prompt.
So
mount root>
is a prompt - but therandom: unblocking device
is not. Okay, that was confusing, but it explains why a keypress seemed to interrupt what I thought was an unblocking process. I had worked out, in my head, when I kept seeing that over and over, what I thought was going on. so now I'm trying to remember what happened with this new information in mind. It sounds like there already was an error or just hitting <return> (which I often did at that point) generated an error at that prompt. I forgot what I'd get, but it might be in one of my screenshots.@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
I would guess that was a USB drive it was having problems with.
That was, if I remember correctly, after I would type
run usbrecovery
. That seems to mean that it was either having trouble using the files on the USB stick or had already started using them. And if it saysmount root>
, does that mean it did some kind ofpivot_root
to change the root to the USB stick? -
Separate from my responses already posted:
-
Now that I got it to restore properly, and another SG1100 is due to arrive Monday, I'd like to set this one up to use as a backup if the other fails.
-
Since I finally got it working properly, I would think I could power down, pull the USB stick it's running from now, and see if I can get it to restore to the internal SSD. As I mentioned many times, I am wondering if that drive may have gone bad. Is there anything in this thread that indicates that it's likely the internal storage is bad? (Also, if I can restore pfSense on the internal storage, it simplifies storage. I don't have to worry about the USB stick becoming separated from the SG1100 or being broken if something on the shelf falls on it and pushes on the USB stick and messes up the connector or the stick itself.)
-
Home Assistant has a nifty add-on that will back up the configuration to a Samba share on the local network. That's a nice safety feature, since it means if the HA system is borked, all that's needed is to just setup a new system and reload the config from the Samba share. Is there anything like that with pfSense, where it will automatically save a config file to an NAS or other external storage regularly?
-
-
@TangoOversway sg1100 doesn't have a ssd, it is a emmc is not?
https://shop.netgate.com/products/1100-pfsense
Storage: 8GB eMMC storage. -
One other MAJOR issue:
I can see this as a major security issue and I was so focused on just getting things working, I kept forgetting to bring it up.
When I was trying to get my firewall back online, there were times when it was connected to my LAN and the WAN, as normal. This was while I was trying to connect to the Netgate servers. The Starlink router was aware of devices on my LAN! So during setup, there was a direct network pass-through on my SG1100! On the Starlink mobile app, it's possible to connect with my router, even remotely (not through wifi, but through my router's communication with Starlink ISP). When I did this, it listed all my smart TVs, my desktop computers, my Home Assistant systems, my Sonos speakers, and a number of other devices on my LAN.
I don't think this is as much a threat in my situation, since I still had that router between my LAN and the internet and most Starlink users will use that router as their only firewall. Also Starlink uses CGNAT, so unless something on my LAN is phoning home for malware (which it might be able to do with pfSense anyway), it's not like someone could penetrate through the CGNAT. But if my firewall was the only safety device between my LAN and the internet, it would have been a security nightmare.
-
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
aware of devices on my LAN! So during setup, there was a direct network pass-through on my SG1100!
Not normally going to be possible unless you had a bridge setup or connected your starlink to a lan port and the rest of the network to another lan port..
But not sure all the things you did during your setup of pfsense. But those ports are all part of the same switch in the 1100 I do believe, not discrete interfaces.. So its possible you had to ports in the same L2 and then yeah devices on one port would be able to "see" devices connected to the other port.
So sure during your setup is possible all those ports were the same L2.
-
automatically save a config file
There is this: https://docs.netgate.com/pfsense/en/latest/backup/autoconfigbackup.html
Re it being a switch if unconfigured, that’s specific to the 1100. Didn’t look but perhaps Netgate could add a note to the reinstall directions to say to consider disconnecting from LAN and OPT during reinstall if it’s not there.
-
If the uboot envs have been updated during an upgrade, which it should have, then the switch LAN and OPT ports are disabled when uboot runs and remain so until pfSense boots.
If you see the
mountroot>
prompt that means the system has been unable to mount the root automatically. Usually that's because it's trying to mount the wrong thing but sometimes it can be because the usb subsystem takes too long to initialise.The
random: unblocking device
message is unrelated to mounting root, you are just seeing it written to the console at that time. It's output from the random device showing that is has sufficient entropy to start responding with random data. Before that it is blocked. -
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
When I was trying to get my firewall back online, there were times when it was connected to my LAN and the WAN, as normal. This was while I was trying to connect to the Netgate servers. The Starlink router was aware of devices on my LAN! So during setup, there was a direct network pass-through on my SG1100! On the Starlink mobile app, it's possible to connect with my router, even remotely (not through wifi, but through my router's communication with Starlink ISP). When I did this, it listed all my smart TVs, my desktop computers, my Home Assistant systems, my Sonos speakers, and a number of other devices on my LAN.
All the SG1100 NICs are, after power down, reset to a switched state.
I don't own or have any hands on experience with a SG1100, but I guess before OS (pfSense) initialization, the nics don't as any traffic at all, but there could be a very short moment where the NIC/switch is activated, and firewall rules and routing isn't load yet, the SG1100 3 port switch becomes what it is : a native switch. Several ms later, pfSense adds the VLAN config, adds firewall rules and adds routing rules.
Its during this short time that you saw the behavior : your pfSense LAN devices were in the same network as the WAN pfSense starlink upstream router.
If you have a switch connected to the pfSense LAN, and this switch was also power reset, all wired devices would received a LAN wire down up event, and that would have triggered their DHCP client. the DHCP request, as soon as the pfSense became a switch for a very brief moment, would "relay" the DHCP requests to the upstream router, the starlink router.
That's why you you see the pfSense LAN devices on the Starlink.
This would last during a very brief situation.
Right after pfSense started up, the lease the device have just obtained is "wrong" as the pfSense LAN has now started up, and its network is of course different (not 192.168.1.1/24) anymore. Devices aren't aware of this of course and this could introduce a somewhat broken network, as they have to restart their DHCP client to get a new lease, from pfSense this time.
If the pfSense LAN switch wasn't reset (rebooted) during the pfSense reboot, the pfSense LAN devices wouldn't be aware of the LAN event (pfSense restart) and they would initiated a DHCP request = all is well.At best, this could be is annoying for you. Not a real security issue as you are after all behind a router, and behind a CGNAT, so no real risk.
Again : this is what I think that can happen.
-
@johnpoz said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Not normally going to be possible unless you had a bridge setup or connected your starlink to a lan port and the rest of the network to another lan port..
and:
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Re it being a switch if unconfigured, that’s specific to the 1100. Didn’t look but perhaps Netgate could add a note to the reinstall directions to say to consider disconnecting from LAN and OPT during reinstall if it’s not there.
and:
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
If the uboot envs have been updated during an upgrade, which it should have, then the switch LAN and OPT ports are disabled when uboot runs and remain so until pfSense boots.
and:
@Gertjan said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
All the SG1100 NICs are, after power down, reset to a switched state.
I don't own or have any hands on experience with a SG1100, but I guess before OS (pfSense) initialization, the nics don't as any traffic at all, but there could be a very short moment where the NIC/switch is activated, and firewall rules and routing isn't load yet, the SG1100 3 port switch becomes what it is : a native switch. Several ms later, pfSense adds the VLAN config, adds firewall rules and adds routing rules.
Its during this short time that you saw the behavior : your pfSense LAN devices were in the same network as the WAN pfSense starlink upstream router.
If you have a switch connected to the pfSense LAN, and this switch was also power reset, all wired devices would received a LAN wire down up event, and that would have triggered their DHCP client. the DHCP request, as soon as the pfSense became a switch for a very brief moment, would "relay" the DHCP requests to the upstream router, the starlink router.
That's why you you see the pfSense LAN devices on the Starlink.
This would last during a very brief situation.I don't know if usbboot is the installer, or if it runs it. I'm going to refer to them as one program, since that's what it looks like.
For clarity (which means I'm using a fair amount of detail):
I did
run usbboot
and it came up with the NIC configuration prompts. Then it tried to reach the Netgate servers. At this point, it was NOT reaching the servers. That's when I checked the status of my Stargate router with the mobile app. This app can connect to the Starlink router directly through wifi, if it's in wifi range of the router. In my case, as mentioned, my Starlink router has 1,000' of fiber optic cable between my SG1100 and the router. It's out in our field, way out of wifi range from the house. That means I use a remote connection to my router, which goes from my phone, through cellular internet, to Starlink's servers, then to my router. So when I was in my house and reading the status info from the Starlink router, it had to go through the internet and not through wifi. When I got a report of Starlink seeing all my LAN devices, there was no way it could see them, other than by the hardwired connection through the SG1100.This happened while the installer was trying to contact the Netgate servers. I'm not sure how long before that part of the program that the connection was made, or if it would have worked after that point. Just for clarification on what I did, as I said, this was in the afternoon. I could NOT get it to connect to the Netgate servers at all, and finally gave in and reluctantly, several hours later, connected the LAN connection from the Starlink router to one of my LAN switches. (This is very unusual for me and one of the few times I've had an ISP router connect to my LAN without going through a firewall.) Later at night, with the Starlink router connected directly to my LAN, I moved the SG1100 upstairs, to my study, where I'd be more comfortable working with it and it'd be near my preferred workstation and it was after that when the SG1100 finally connected to the servers.
So there was a time, after I told the installer to leave both NICs up and use the default configuration, when the Starlink router was aware of what was on my LAN. Oddly, not every single device, not some IoT devices, but many devices that could ID themselves. (Such as Apple computers and Apple TV, some OctoPrint systems, some Linux systems, Sonos devices, and Home Assistant systems. (That's what I can recall at this point.)
So this was not a temporary condition while the NICs or software were initializing, or while one was up and the other was not. It was a stable condition for as long as the installer was in that state from the program. (And, oddly while Starlink could see the entire LAN, that was when the SG1100 was not able to reach a server. I'm wondering if the two could be connected - like the conflicting address space @stephenw10 have both been concerned about - or that allowing the passthrough with other devices on the LAN side might have created other issues.)
This is something I triple checked everything, including the connections, my information from Starlink's mobile app, and all the wiring connections and routing. I have some perception and reading issues, so I am used to verifying my work. Yes, it can take me a bit longer than many to put together or figure out how some systems or things fit together, but that means I'm used to that and used to what's going on in my head more than most. I can state, with 100% certainty, that my LAN devices WERE visible, through the SG1100, to the Starlink router, during the time the SG1100 was trying to reach the servers.
I have pointed this out to TAC support, in my discussion with them and pointed out it's a critical issue. They have thanked me for my feedback. (Whether that means they're now watching this thread for more info or, at the other end of possibilities, have thanked me and filed my comments in (what we used to call "File 13" or "The Circular File") the bit bucket. I think this is a critical issue, since the primary purpose of pfSense and an SG1100 is to protect what is on the LAN side from the WAN side and during this time, that is not happening.
I strongly agree with @SteveITS: Netgate could (I say should) add a note to disconnect WAN and OPT during the install.
@Gertjan said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
At best, this could be is annoying for you. Not a real security issue as you are after all behind a router, and behind a CGNAT, so no real risk.
Yes, in my case, I'm dealing with much less of a threat than most people. Still, I haven't trusted ISPs since I found out that Verizon had been messing around with the settings on my Verizon router - proving they could connect to that router and, therefore, my entire LAN, even with secure firewall rules. As I said, I don't trust Starlink, since tying their service into a political agreement (about recognizing Martian colonies as independent) tells me they are okay mixing their agenda with their service when the two have nothing to do with teach other. (Again, this is NOT about current politics or about Musk - leaving him OUT of the conversation.)
But I see the system being an open pathway for more than a couple seconds at a time as a serious issue.
-
I consider the issue of my SG1100 being just a "pass through" device for one phase of the install so important I did not want to include other topics in that reply. So handling other stuff here.
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
If you see the mountroot> prompt that means the system has been unable to mount the root automatically. Usually that's because it's trying to mount the wrong thing but sometimes it can be because the usb subsystem takes too long to initialise.
The random: unblocking device message is unrelated to mounting root, you are just seeing it written to the console at that time. It's output from the random device showing that is has sufficient entropy to start responding with random data. Before that it is blocked.
Okay, you've been thorough and made a few points about this, but this really helped me put the pieces together with what you've already said (that's for me - you've been clear and helpful - just took me a bit to put it together).
So the two items are not at all connected. I get that.
I would think a fix for that would be to include an error message like, "Route file system could not be mounted," before the mountroot prompt appears. Also, it might be that error messages that could appear after a prompt like that might need a linefeed or something before them os they don't show up after a prompt, on the same line. I don't know the system, though, so I don't know if that's practical. It would be easier to read if it did not look like it was part of the mountroot prompt. (To me that looked like both were part of the same message.)
-
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
I did run usbboot
The installer is also activating the NICs - so all the NICs become activate in their default state : a switch, without any further firewall rules etc. As long as you stay in the installer, which is a stripped down FreeBSD OS, this situation is valid. That explains what you saw.
Very IHMO of course. I actually hope to be wrong. -
If you interrupt the boot again to each the
Marvell>>
prompt where you previously ran usbrecovery and instead runprintenv
you will see all the current uboot envs.They should include:
switch_disable=switch phy_write 1 0 0 0xffff; switch phy_write 2 0 0 0xffff; echo "Switch Ports Disabled";
and
preboot=run switch_disable;
That is run very early during the boot to isolate the switch ports. If you really try hard, like running a ping flood, you might get some packets through but it should not be connected long enough for dhcp.
-
@stephenw10
Thanks for that info -
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
They should include:
switch_disable=switch phy_write 1 0 0 0xffff; switch phy_write 2 0 0 0xffff; echo "Switch Ports Disabled";
and
preboot=run switch_disable;That is run very early during the boot to isolate the switch ports. If you really try hard, like running a ping flood, you might get some packets through but it should not be connected long enough for dhcp.
The way I read that, if you run
preboot=run switch_disable
it will block the connection, but I did have and could see a connection. So shouldn't that be run by default or maybe the installer should do it automatically? -
Nope the
preboot
env is evaluated automatically by uboot before it runs theboot
env. You shouldn't need to do anything. -
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Nope the preboot env is evaluated automatically by uboot before it runs the boot env. You shouldn't need to do anything.
But shouldn't that have prevented my Starlink router on the WAN from seeing everything on the LAN? I thought I got how that happened, since the switch was active, but rules were not. Apparently that's not what's going on.
-
Yes it should. Which implies the uboot envs may never have been updated. That was not set in the 1100 initially, it was added specifically to address this issue and should be applied automatically at pfSense upgrade.
-
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Yes it should. Which implies the uboot envs may never have been updated. That was not set in the 1100 initially, it was added specifically to address this issue and should be applied automatically at pfSense upgrade.
So I did see the devices, and it wasn't something misleading and this does indicate there is a problem with the 1100 providing pass-through?
I'm not saying that as an, "I was right!" kind of thing - just being sure I am following what is going on. I'm sure this is the kind of thing Netgate would resolve quickly, so I'm not blaming or accusing. Just trying to verify that either I messed up or that it's an issue that's going to be handled.
-
Yeah I'm suggesting you almost certainly don't have those uboot envs for some reason.
If you need to you can force it to rewrite uboot and update the envs from pfSense like so:
[root@1100-3.stevew.lan]/root: /usr/local/share/u-boot/1100/u-boot-update.sh -f => U-Boot is already at the latest version. Continuing with the installation... => Updating the Netgate 1100 U-boot ==> Reading current settings ==> Updating the U-boot image (this may take a few minutes) 64+0 records in 64+0 records out 4194304 bytes transferred in 53.925072 secs (77780 bytes/sec) ==> Updating settings ==> Restoring settings writing u-boot env(1)... done
-
I have a replacement that was supposed to arrive today - but FedEx delivered it to the wrong address. (We have continual issues with FedEx making proper deliveries.) Once I get the replacement, my only plan for this unit was to keep it stored as an emergency replacement. Since I can keep the OS on the USB stick and always have that, I might try to re-install on the built-in storage.
If I do that, or anything else with this unit, I will not have the LAN and WAN plugged in at the same time until I'm sure the configuration I upload is working.
Does the Netgate symbol by your name mean you're with Netgate? If so, are they looking into this or do I need to file a bug or incident report?
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Yeah I'm suggesting you almost certainly don't have those uboot envs for some reason.
I bought this in 2020, almost exactly 5 years ago. Is that long enough ago that things could be different?