SG-1100 Won’t Reboot on Upgrade - no internet access!
-
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
aware of devices on my LAN! So during setup, there was a direct network pass-through on my SG1100!
Not normally going to be possible unless you had a bridge setup or connected your starlink to a lan port and the rest of the network to another lan port..
But not sure all the things you did during your setup of pfsense. But those ports are all part of the same switch in the 1100 I do believe, not discrete interfaces.. So its possible you had to ports in the same L2 and then yeah devices on one port would be able to "see" devices connected to the other port.
So sure during your setup is possible all those ports were the same L2.
-
automatically save a config file
There is this: https://docs.netgate.com/pfsense/en/latest/backup/autoconfigbackup.html
Re it being a switch if unconfigured, that’s specific to the 1100. Didn’t look but perhaps Netgate could add a note to the reinstall directions to say to consider disconnecting from LAN and OPT during reinstall if it’s not there.
-
If the uboot envs have been updated during an upgrade, which it should have, then the switch LAN and OPT ports are disabled when uboot runs and remain so until pfSense boots.
If you see the
mountroot>
prompt that means the system has been unable to mount the root automatically. Usually that's because it's trying to mount the wrong thing but sometimes it can be because the usb subsystem takes too long to initialise.The
random: unblocking device
message is unrelated to mounting root, you are just seeing it written to the console at that time. It's output from the random device showing that is has sufficient entropy to start responding with random data. Before that it is blocked. -
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
When I was trying to get my firewall back online, there were times when it was connected to my LAN and the WAN, as normal. This was while I was trying to connect to the Netgate servers. The Starlink router was aware of devices on my LAN! So during setup, there was a direct network pass-through on my SG1100! On the Starlink mobile app, it's possible to connect with my router, even remotely (not through wifi, but through my router's communication with Starlink ISP). When I did this, it listed all my smart TVs, my desktop computers, my Home Assistant systems, my Sonos speakers, and a number of other devices on my LAN.
All the SG1100 NICs are, after power down, reset to a switched state.
I don't own or have any hands on experience with a SG1100, but I guess before OS (pfSense) initialization, the nics don't as any traffic at all, but there could be a very short moment where the NIC/switch is activated, and firewall rules and routing isn't load yet, the SG1100 3 port switch becomes what it is : a native switch. Several ms later, pfSense adds the VLAN config, adds firewall rules and adds routing rules.
Its during this short time that you saw the behavior : your pfSense LAN devices were in the same network as the WAN pfSense starlink upstream router.
If you have a switch connected to the pfSense LAN, and this switch was also power reset, all wired devices would received a LAN wire down up event, and that would have triggered their DHCP client. the DHCP request, as soon as the pfSense became a switch for a very brief moment, would "relay" the DHCP requests to the upstream router, the starlink router.
That's why you you see the pfSense LAN devices on the Starlink.
This would last during a very brief situation.
Right after pfSense started up, the lease the device have just obtained is "wrong" as the pfSense LAN has now started up, and its network is of course different (not 192.168.1.1/24) anymore. Devices aren't aware of this of course and this could introduce a somewhat broken network, as they have to restart their DHCP client to get a new lease, from pfSense this time.
If the pfSense LAN switch wasn't reset (rebooted) during the pfSense reboot, the pfSense LAN devices wouldn't be aware of the LAN event (pfSense restart) and they would initiated a DHCP request = all is well.At best, this could be is annoying for you. Not a real security issue as you are after all behind a router, and behind a CGNAT, so no real risk.
Again : this is what I think that can happen.
-
@johnpoz said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Not normally going to be possible unless you had a bridge setup or connected your starlink to a lan port and the rest of the network to another lan port..
and:
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Re it being a switch if unconfigured, that’s specific to the 1100. Didn’t look but perhaps Netgate could add a note to the reinstall directions to say to consider disconnecting from LAN and OPT during reinstall if it’s not there.
and:
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
If the uboot envs have been updated during an upgrade, which it should have, then the switch LAN and OPT ports are disabled when uboot runs and remain so until pfSense boots.
and:
@Gertjan said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
All the SG1100 NICs are, after power down, reset to a switched state.
I don't own or have any hands on experience with a SG1100, but I guess before OS (pfSense) initialization, the nics don't as any traffic at all, but there could be a very short moment where the NIC/switch is activated, and firewall rules and routing isn't load yet, the SG1100 3 port switch becomes what it is : a native switch. Several ms later, pfSense adds the VLAN config, adds firewall rules and adds routing rules.
Its during this short time that you saw the behavior : your pfSense LAN devices were in the same network as the WAN pfSense starlink upstream router.
If you have a switch connected to the pfSense LAN, and this switch was also power reset, all wired devices would received a LAN wire down up event, and that would have triggered their DHCP client. the DHCP request, as soon as the pfSense became a switch for a very brief moment, would "relay" the DHCP requests to the upstream router, the starlink router.
That's why you you see the pfSense LAN devices on the Starlink.
This would last during a very brief situation.I don't know if usbboot is the installer, or if it runs it. I'm going to refer to them as one program, since that's what it looks like.
For clarity (which means I'm using a fair amount of detail):
I did
run usbboot
and it came up with the NIC configuration prompts. Then it tried to reach the Netgate servers. At this point, it was NOT reaching the servers. That's when I checked the status of my Stargate router with the mobile app. This app can connect to the Starlink router directly through wifi, if it's in wifi range of the router. In my case, as mentioned, my Starlink router has 1,000' of fiber optic cable between my SG1100 and the router. It's out in our field, way out of wifi range from the house. That means I use a remote connection to my router, which goes from my phone, through cellular internet, to Starlink's servers, then to my router. So when I was in my house and reading the status info from the Starlink router, it had to go through the internet and not through wifi. When I got a report of Starlink seeing all my LAN devices, there was no way it could see them, other than by the hardwired connection through the SG1100.This happened while the installer was trying to contact the Netgate servers. I'm not sure how long before that part of the program that the connection was made, or if it would have worked after that point. Just for clarification on what I did, as I said, this was in the afternoon. I could NOT get it to connect to the Netgate servers at all, and finally gave in and reluctantly, several hours later, connected the LAN connection from the Starlink router to one of my LAN switches. (This is very unusual for me and one of the few times I've had an ISP router connect to my LAN without going through a firewall.) Later at night, with the Starlink router connected directly to my LAN, I moved the SG1100 upstairs, to my study, where I'd be more comfortable working with it and it'd be near my preferred workstation and it was after that when the SG1100 finally connected to the servers.
So there was a time, after I told the installer to leave both NICs up and use the default configuration, when the Starlink router was aware of what was on my LAN. Oddly, not every single device, not some IoT devices, but many devices that could ID themselves. (Such as Apple computers and Apple TV, some OctoPrint systems, some Linux systems, Sonos devices, and Home Assistant systems. (That's what I can recall at this point.)
So this was not a temporary condition while the NICs or software were initializing, or while one was up and the other was not. It was a stable condition for as long as the installer was in that state from the program. (And, oddly while Starlink could see the entire LAN, that was when the SG1100 was not able to reach a server. I'm wondering if the two could be connected - like the conflicting address space @stephenw10 have both been concerned about - or that allowing the passthrough with other devices on the LAN side might have created other issues.)
This is something I triple checked everything, including the connections, my information from Starlink's mobile app, and all the wiring connections and routing. I have some perception and reading issues, so I am used to verifying my work. Yes, it can take me a bit longer than many to put together or figure out how some systems or things fit together, but that means I'm used to that and used to what's going on in my head more than most. I can state, with 100% certainty, that my LAN devices WERE visible, through the SG1100, to the Starlink router, during the time the SG1100 was trying to reach the servers.
I have pointed this out to TAC support, in my discussion with them and pointed out it's a critical issue. They have thanked me for my feedback. (Whether that means they're now watching this thread for more info or, at the other end of possibilities, have thanked me and filed my comments in (what we used to call "File 13" or "The Circular File") the bit bucket. I think this is a critical issue, since the primary purpose of pfSense and an SG1100 is to protect what is on the LAN side from the WAN side and during this time, that is not happening.
I strongly agree with @SteveITS: Netgate could (I say should) add a note to disconnect WAN and OPT during the install.
@Gertjan said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
At best, this could be is annoying for you. Not a real security issue as you are after all behind a router, and behind a CGNAT, so no real risk.
Yes, in my case, I'm dealing with much less of a threat than most people. Still, I haven't trusted ISPs since I found out that Verizon had been messing around with the settings on my Verizon router - proving they could connect to that router and, therefore, my entire LAN, even with secure firewall rules. As I said, I don't trust Starlink, since tying their service into a political agreement (about recognizing Martian colonies as independent) tells me they are okay mixing their agenda with their service when the two have nothing to do with teach other. (Again, this is NOT about current politics or about Musk - leaving him OUT of the conversation.)
But I see the system being an open pathway for more than a couple seconds at a time as a serious issue.
-
I consider the issue of my SG1100 being just a "pass through" device for one phase of the install so important I did not want to include other topics in that reply. So handling other stuff here.
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
If you see the mountroot> prompt that means the system has been unable to mount the root automatically. Usually that's because it's trying to mount the wrong thing but sometimes it can be because the usb subsystem takes too long to initialise.
The random: unblocking device message is unrelated to mounting root, you are just seeing it written to the console at that time. It's output from the random device showing that is has sufficient entropy to start responding with random data. Before that it is blocked.
Okay, you've been thorough and made a few points about this, but this really helped me put the pieces together with what you've already said (that's for me - you've been clear and helpful - just took me a bit to put it together).
So the two items are not at all connected. I get that.
I would think a fix for that would be to include an error message like, "Route file system could not be mounted," before the mountroot prompt appears. Also, it might be that error messages that could appear after a prompt like that might need a linefeed or something before them os they don't show up after a prompt, on the same line. I don't know the system, though, so I don't know if that's practical. It would be easier to read if it did not look like it was part of the mountroot prompt. (To me that looked like both were part of the same message.)
-
@TangoOversway said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
I did run usbboot
The installer is also activating the NICs - so all the NICs become activate in their default state : a switch, without any further firewall rules etc. As long as you stay in the installer, which is a stripped down FreeBSD OS, this situation is valid. That explains what you saw.
Very IHMO of course. I actually hope to be wrong. -
If you interrupt the boot again to each the
Marvell>>
prompt where you previously ran usbrecovery and instead runprintenv
you will see all the current uboot envs.They should include:
switch_disable=switch phy_write 1 0 0 0xffff; switch phy_write 2 0 0 0xffff; echo "Switch Ports Disabled";
and
preboot=run switch_disable;
That is run very early during the boot to isolate the switch ports. If you really try hard, like running a ping flood, you might get some packets through but it should not be connected long enough for dhcp.
-
@stephenw10
Thanks for that info -
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
They should include:
switch_disable=switch phy_write 1 0 0 0xffff; switch phy_write 2 0 0 0xffff; echo "Switch Ports Disabled";
and
preboot=run switch_disable;That is run very early during the boot to isolate the switch ports. If you really try hard, like running a ping flood, you might get some packets through but it should not be connected long enough for dhcp.
The way I read that, if you run
preboot=run switch_disable
it will block the connection, but I did have and could see a connection. So shouldn't that be run by default or maybe the installer should do it automatically? -
Nope the
preboot
env is evaluated automatically by uboot before it runs theboot
env. You shouldn't need to do anything. -
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Nope the preboot env is evaluated automatically by uboot before it runs the boot env. You shouldn't need to do anything.
But shouldn't that have prevented my Starlink router on the WAN from seeing everything on the LAN? I thought I got how that happened, since the switch was active, but rules were not. Apparently that's not what's going on.
-
Yes it should. Which implies the uboot envs may never have been updated. That was not set in the 1100 initially, it was added specifically to address this issue and should be applied automatically at pfSense upgrade.
-
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Yes it should. Which implies the uboot envs may never have been updated. That was not set in the 1100 initially, it was added specifically to address this issue and should be applied automatically at pfSense upgrade.
So I did see the devices, and it wasn't something misleading and this does indicate there is a problem with the 1100 providing pass-through?
I'm not saying that as an, "I was right!" kind of thing - just being sure I am following what is going on. I'm sure this is the kind of thing Netgate would resolve quickly, so I'm not blaming or accusing. Just trying to verify that either I messed up or that it's an issue that's going to be handled.
-
Yeah I'm suggesting you almost certainly don't have those uboot envs for some reason.
If you need to you can force it to rewrite uboot and update the envs from pfSense like so:
[root@1100-3.stevew.lan]/root: /usr/local/share/u-boot/1100/u-boot-update.sh -f => U-Boot is already at the latest version. Continuing with the installation... => Updating the Netgate 1100 U-boot ==> Reading current settings ==> Updating the U-boot image (this may take a few minutes) 64+0 records in 64+0 records out 4194304 bytes transferred in 53.925072 secs (77780 bytes/sec) ==> Updating settings ==> Restoring settings writing u-boot env(1)... done
-
I have a replacement that was supposed to arrive today - but FedEx delivered it to the wrong address. (We have continual issues with FedEx making proper deliveries.) Once I get the replacement, my only plan for this unit was to keep it stored as an emergency replacement. Since I can keep the OS on the USB stick and always have that, I might try to re-install on the built-in storage.
If I do that, or anything else with this unit, I will not have the LAN and WAN plugged in at the same time until I'm sure the configuration I upload is working.
Does the Netgate symbol by your name mean you're with Netgate? If so, are they looking into this or do I need to file a bug or incident report?
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Yeah I'm suggesting you almost certainly don't have those uboot envs for some reason.
I bought this in 2020, almost exactly 5 years ago. Is that long enough ago that things could be different?
-
Yes I work at Netgate.
Yes 5 years ago is long enough that it may have shipped without that fix. Running the above command will add the appropriate uboot envs.
The new 1100 should already have them if you just ordered it but since it will be available for testing I'd encourage you to check it to see how to access it etc when not against the clock!
-
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
The new 1100 should already have them if you just ordered it but since it will be available for testing I'd encourage you to check it to see how to access it etc when not against the clock!
So I can check
printenv
, as you mentioned and see if it includes:switch_disable=switch phy_write 1 0 0 0xffff; switch phy_write 2 0 0 0xffff; echo "Switch Ports Disabled";
and
preboot=run switch_disable;
Right?
-
Yes, run
printenv
from uboot, the Marvell>> prompt, to see the current configured envs. -
Another thought - since you've been interested in the issue of the same address space on both sides and since we both thought that's why it could not reach the servers.
First I had it in the usual setup place, one CAT5 going to Starlink, one going to my LAN. That's when I was trying, over and over, to get
run usbboot
to work. I even tried disabling the LAN interface. I could not get through no matter what. And, as we've discussed, with the 1100 setup that way, Starlink was still seeing my entire LAN.I figured if there are issues with Starlink getting to my LAN, that had happened. So I connected Starlink to my switch and took my 1100 out.
I keep asking, "What was different between that setup and the one I used where it finally got through to the servers?" Well, first, it tried multiple times and did not get through to them every time. So maybe there's some randomness involved.
The second setup, when it did work, as after Starlink was acting as DNS for my entire LAN, not just the 1100's WAN interface. I took the 1100 upstairs (I know physical position isn't an issue), so I could work on it near my desktop. I hooked up the WAN side to my LAN (which, to me, seems the same as hooking it up to the Starlink router, just with more systems on the same connection). But this time I did NOT hook up anything to the LAN side. As best I can remember, that's the big difference. Also, I found I could get the connection to Netgate servers with the LAN NIC up, but NOT with it down. (And it was never plugged in.)
So I'm wondering if what made the difference in connecting to Netgate was that both LAN and WAN had the same address space, but that, somehow, having nothing on the LAN side to give an address to could have made the difference.