SG-1100 Won’t Reboot on Upgrade - no internet access!
-
Akismet is flagging this as spam. Bet it's due to the XML data.
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Double check your USB stick for other config.xml files? em0/em1 should not be interfaces in an 1100 config file. It should use mvneta0.4090 etc.
From my config (only one on the USB stick):
<interfaces> <wan> <enable></enable> <if>mvneta0.4090</if> <switchif>switch0.port3</switchif> <descr><![CDATA[WAN]]></descr> <alias-address></alias-address> <alias-subnet>32</alias-subnet> <spoofmac></spoofmac> <ipaddr>dhcp</ipaddr> <dhcphostname></dhcphostname> <dhcprejectfrom></dhcprejectfrom> <adv_dhcp_pt_timeout></adv_dhcp_pt_timeout> <adv_dhcp_pt_retry></adv_dhcp_pt_retry> <adv_dhcp_pt_select_timeout></adv_dhcp_pt_select_timeout> <adv_dhcp_pt_reboot></adv_dhcp_pt_reboot> <adv_dhcp_pt_backoff_cutoff></adv_dhcp_pt_backoff_cutoff> <adv_dhcp_pt_initial_interval></adv_dhcp_pt_initial_interval> <adv_dhcp_pt_values>SavedCfg</adv_dhcp_pt_values> <adv_dhcp_send_options></adv_dhcp_send_options> <adv_dhcp_request_options></adv_dhcp_request_options> <adv_dhcp_required_options></adv_dhcp_required_options> <adv_dhcp_option_modifiers></adv_dhcp_option_modifiers> <adv_dhcp_config_advanced></adv_dhcp_config_advanced> <adv_dhcp_config_file_override></adv_dhcp_config_file_override> <adv_dhcp_config_file_override_path></adv_dhcp_config_file_override_path> <dhcpcvpt>bk</dhcpcvpt> <ipaddrv6>dhcp6</ipaddrv6> <dhcp6-duid></dhcp6-duid> <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len> <dhcp6cvpt>bk</dhcp6cvpt> <adv_dhcp6_prefix_selected_interface>wan</adv_dhcp6_prefix_selected_interface> </wan> <lan> <enable></enable> <if>mvneta0.4091</if> <switchif>switch0.port2</switchif> <descr><![CDATA[LAN]]></descr> <spoofmac></spoofmac> <ipaddr>172.16.7.1</ipaddr> <subnet>22</subnet> <ipaddrv6>track6</ipaddrv6> <track6-interface>wan</track6-interface> <track6-prefix-id>0</track6-prefix-id> </lan> <opt1> <if>mvneta0.4092</if> <descr><![CDATA[OPT]]></descr> <enable></enable> <spoofmac></spoofmac> </opt1> </interfaces>
Can't find EM0 or EM1 in there at all (other than in a string that looks like a crypto key or something like that - so it's part of a long string of random numbers and letters.)
@SteveITS said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
ry assigning just WAN to mvneta0 and see if it lets you not assign LAN at all, at least to get to the menu.
If you mean do that now, after boot (post install), I've tried that and it didn't accept it.
I'll try a new install and disable the LAN when I do it. I'm reading up on installing it to a USB stick first.
-
Is the net installer the one I downloaded and have been using? I tried searching for "pfsense net installer" and didn't get anything useful.
-
Yes, the Net Installer is what you downloaded from the store.
I assume the config you are restoring was from the 1100?
The em NICs it's complaining about there are probably from the default config. pfSense builds a config based on a default file with additions for specific hardware. So for an 1100 it should see that and add the default VLANs and switch config. You should not see em0,em1.
So somehow it's losing the config that would have been generated at install.
I suggest installing clean and keeping the default config until you're able to access the webgui. Then restore your config there.
-
Re-installing. Got to this screen:
I notice both are
mvneta0
. Later when I have to name the interface (in the post install part where I was caught in a loop), I'm wondering if I should have connected to the LAN. There was no name other than mvneta0 as an option. (I tried mvneta0.4090, as suggested, and got an error.) -
Yes, those are the correct default interfaces for the 1100. It only has one NI (mvneta0) so the interfaces are VLANs on that NIC.
After install it should boot completely without asking you reassign the NICs. It's unclear why it somehow pulled in the pfSense default config with em0 and em1 whoch don't exist in the 1100.
-
To re-assign WAN as that after install you have to answer Yes when it asks if you want to create VLANs Then create 4090 and 4091 on mvneta0. Then it will allow you set mvneta0.4090 as WAN
-
Do you have the TAC ticket ID you opened? They usually respond to those in minutes.
-
I thought I opened a TAC ticket late last night, but had left the form up so I could get the SN and other info from my box. So I filled that in and sent it in today - maybe an hour ago, maybe longer.
I'm back to trying to reach the servers. I've deactivated the LAN and trying it over and over.
I'm wondering if there might be a reason why it only took a few retries in the early morning (US Eastern time) and during the day it's just not connecting.
Again, I see the LEDs flashing on the RJ45 and it doesn't complain about the NIC being inactive or anything.
This is the part where I wonder if a different IP address would help.
-
I had disabled the LAN and it couldn't reach the servers. Enabled it and it did, first try. Then I realized I forgot to put in the blank USB stick in the USB3.0 socket, so I had to go back and restart. Again left the LAN on and it went through first time. So it's formatting and preparing to install to the USB stick.
A thought on that: While I have a new SG1100 coming in next week, I'm wondering if, once I get it working on the USB stick, it would be easy to copy or clone that system to the main drive and see if it works on there.
Ah - it's fetching and stuff now. So I guess I can take a break and get one or two things done while it spends time doing that.
-
You had to assign it? Or it detected it?
You will have to set LAN as none or chnage it's subnet in the installer to avoid a conflict there.
-
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
You had to assign it? Or it detected it?
First I went through and specifically picked "None" or whatever the option was to not detect or use it. And it wouldn't connect to the servers.
Then I canceled and let the install restart. When it got there, I just hit <return> and let it keep the values. Then it connected to the server without a problem - two times. (I had to do it a 2nd time so I could plug in the USB stick I wanted to install it on.)
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
You will have to set LAN as none or chnage it's subnet in the installer to avoid a conflict there.
It seems to be working without that. It's got the LAN set up (as I said, I just hit <return>). But this is with the initial install at this point, where I can't touch the subnets.
Is this something Netgate should look into, since at least one ISP now is forcing a 192.168.1.xxx address space? Starlink is often a "last choice" when it's the only choice and they're all over the US and Canada now and I think in many other countries worldwide, so I would think this could become an issue.
-
Yes in retrospect pfSense should probably have used a different default subnet. The problem now is that it's been that for so long changing it would cause confusion at best.
But we are aware of the issue and you should be able to set it in any install situation you find.
-
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
Yes in retrospect pfSense should probably have used a different default subnet. The problem now is that it's been that for so long changing it would cause confusion at best.
I get that - and there was no way of foretelling the arrogance of Starlink and their decision in what was, at the time, years in the future.
BUT
I wonder how hard it would be to check the WAN during install and, if the address it's been given is in that default subnet for pfSense, offering the user a choice to pick a different subnet to use. Also, this is during the install, not when it's configured, so, perhaps, switching to a different subnet just for the install, then switching back to the default at the end of the install might work. At that point, once it's done what it has to do during the install (or even during the post-reboot configuration), it could change back to the default subnet OR offer the user the choice.
@stephenw10 said in SG-1100 Won’t Reboot on Upgrade - no internet access!:
But we are aware of the issue and you should be able to set it in any install situation you find.
Well, somehow, it's working for me, but it's not something that's changeable during the install itself, which is when I've been having the problem.
-
Made it through all the past install crashes. It's still extracting packages, which is taking a good while, but it's working on a USB stick, which is going to be much slower than the internal drive. It's well past the boost-libs-1.85 package, which was the child process that was always killed. (That was package 53/177. It's on the last package now.)
-
I'd be happier if I had not already seen this and had to go back! (Still, it's progress.)
-
Even better!
It never asked me for any LAN info or to connect to the internet or anything like that. All the stuff I dealt with before are no issue now, at all. Only thing question it asked before this menu came up was to set my password.
And notice it's using the same address space on both NICs. Well, we'll change that when I upload my config through the web.
I have to leave for an appointment soon, so if I don't post an update, what I'm going to do now is to halt the system, disconnect the WAN connection, connect the LAN directly to my Mac, and upload my backed up configuration.
I'm surprised that if I had just left the LAN interface defined during setup, this time things would have gone smoothly. Yesterday there were so many wacky things that didn't make sense, it's like a dream. I had problems with serial connections, with it booting, and other issues I reported. I'm wondering if a lot of them could be explained by failing internal storage. That could have led to problems booting, reading settings and info, and maybe more.
-
Just this part to show the web config is up and running AND it's using my config info, with my LAN address space!
I was hoping to take it downstairs and plug it in where it belongs and quickly check to see if things are behaving, but I won't be able to now, since it has to finish reinstalling packages.
I've got an appointment and about 90 minutes of driving coming up (to get there and back), so that'll give me time to mentally review everything. I want to list the issues that came up and see if it's possible to figure out why they weren't a problem today when they were so hard to deal with yesterday.
-
Well looking much better at least!
-
Short version: It's all working fine now.
Also, I want to thank everyone who chipped in and helped along the way. Some people spent a lot of time on this thread writing helpful comments and suggestions and reading through a lot of my details to help me work things out. That is deeply appreciated!
This may be long, but I'm trying to be thorough. I think pfSense is a strong program and I've been using it for so long I don't remember when I first used it. I want to say it was somewhere around 2005, but I'm not sure. First I used the open source version on a Soekris net5501 (if that's the right model number) for many years, then switched to the SG1100 5 years ago. If any of this rotten experience helps improve any part of the program, I'll be glad it helps.
Yesterday (Thursday) was horrendous and I feel like I went through a worst-case restore situation. The only two ways it could have been worse is if my SG1100 completely crashed and I didn't have a config backup or if I had not backed up the config before it went bad. I think it's important to stress that it did not just crash on its own. It crashed as I was upgrading, not when I tried something tricky or experimental.
I've been thinking through what happened and what could have made it easier for me to restore my device to functionality. I can see why there is no way to do a factory reset. I do wish that the boot firmware had ssh included so there would be a way to connect without a serial cable, but I understand there are probably reasons against that.
So that leaves my experience and the things that went wrong (or the things that were good). I think, since so much went wrong, this is a good case study for Netgate, since almost anything that could go wrong went wrong.
Issues I faced that can likely be fixed:
- With
usbrestore
, I ran into a problem when the device was being deblocked (unblocked? not sure of the term or the exact message. I think it's in a screenshot or comment upthread). The message is not as clear as it could be and says something like, "Device being deblocked." It takes time, so it's hard to tell if it's doing something or if it just got hung up. As a user, this is confusing. Is it hung up or working in the background? Pressing any button apparently (in my experience) terminates the process, leaving the device in an unknown condition. From there it's not possible to proceed without running it again and facing the same issue. Suggested fix: Add a line at the start saying, "(This could take several minutes.)" Since Netgate knows what devices they have and that this will run on, it might even be possible to include a time, like, "This may take up to 5 minutes." Another way to handle it is to include a counter on how many blocks have been zeroed out or some other status information that changes while the program is running. The purpose is to let the user know the program didn't hang, it's just busy. Also, if the user presses a button, rather than quit, the program could prompt with something like, "Still unblocking device. Abort? (y/N):"
I think this issue (including the deblocking aborting on a keypress) added several hours to my restore process.
- Default address space on LAN can conflict with WAN. In my case, with Starlink as an ISP, I have no way to change the address space on the WAN side. It's far less than optimal for Starlink to force the 192.168.1.xxx address space, but they do. It's the same address space pfSense defaults to. @stephenw10 and I have both expressed concerns about this. However, when things finally worked well, it didn't seem to be an issue. (Oddly, it was a nightmare on Thursday, but when I tried doing things again on Friday, it wasn't an issue - and I cannot figure out what was different. On friday, when things worked, at at least two points, pfSense reported the address for both the WAN and LAN they were both in the same address space. How that worked at all is beyond me. Suggested fix: Before connecting to the Netgate servers, check the WAN IP address. If it's in the 192.168.1.xxx address range, give the user a choice of using a different address space on the LAN. It may be better, though, to not give a choice. Since the LAN connection is not used at all until after the reboot, if the WAN is using this address space, either automatically deactivate the LAN NIC or change to another address space. Then, before rebooting, change back to the default address space. This way there is no conflict and the user doesn't have to deal with the issue at all.
I don't know what went on and why it took hours and dozens of tries to connect to the Netgate servers, but this issue probably added 4-5 hours to my restore process. (And I'm not exaggerating on the timing!)
The last time I tried the install, almost everything was perfect - what wasn't has been addressed in the issues I list below. But for some reason, it just would NOT work properly at all for hours and hours when I first tried everything.
-
I had an install of v14.11 and v14.03 both crash at the same point (discussed upthread). Basically it was during expansion of downloaded packages. I suspect it was a driver failure. I don't know where the packages are downloaded to and where they are unpacked. Just in case, I tried a hack and made my own USB restore drive by formatting a 256GB USB stick in FAT32, then copying the files from a net install image on a USB stick onto my stick. That provided a lot of free space. It may have been coincidence, but the failure message (about a background process having to be killed) was no help. My guess was that it was a storage problem and, again, maybe the sign of a failing drive. (How long do the drives in an SG1100 tend to last? Should I just plan on replacing these units every so many years?) During the download and unpacking, it would be nice, if things fail, if the drive space was checked and reported in any abort or error messages to indicate if the problem could be storage.
-
Restoring with my configuration file lead to multiple issues. (Discussed upthread.) There were reports from the post-install configuration (after reboot) about 2 interfaces that, apparently, are not even on the SG1100. I'm not sure what was going on here, but my configuration file was the one I downloaded from the SG1100 before I started my failed upgrade and it was loaded from the web interface and installed without issue. I have every reason to believe it's a good configuration file, but the issues that showed up because of me trying to restore with it cost me 2-3 hours. When I tried the same steps, but without the configuration file, I finally got a good restore. (It just occurred to me that I use Tailscale. I don't know if that creates virtual devices, but maybe that could be part of what caused this.)
-
Include a section in the dos about restoring to a USB stick. I finally decided to do that and things worked perfectly. That may be the one thing that fixed all the other problems. While the only 2 extra steps (setting the env options) are simple, it would be worth it to have a section on that in the docs. This post says a lot of what's needed, but having it in the docs would prove helpful to a lot of people.
Intermittent or Tricky Issues that May Not be Debuggable
-
I've mentioned several times that I think the issue could be that my drive is failing. That would explain several issues, maybe even explain all of them. I think it would be quite useful to add an option to the boot menus to make it easy to run fsck to verify the system storage devices. Maybe even add a prompt when the Marvell shell comes up about what command can be run to verify device integrity. This would be incredibly helpful, since most of us may not be familiar with just what the underlying OS is (yes, it's BSD, but a lto of users are inexperienced with BSD and don't know what tools are available on it). Some kind of prompting or including a menu item to verify drive integrity would be a major help.
-
Flakey serial connections: Connecting with the serial console is easy on Mac and Linux, just
screen <device node> 115200
. It's almost foolproof, but at one point, when I brought the device up to my study to make it more comfortable to work on it, I tried booting over and over and often saw gibberish on the serial connection. There were times it was a matter of characters not showing up, so I might get something like "eger" instead of "Netgear." I also had times when the serial connection failed altogether. This could have been because of my cable, but I also noticed that there were a number of times that adjusting the cable connection on the SG1100 fixed things. I think the position of the USB-B microconnector on the motherboard and the thickness of the case may create an issue where the ends of some USB cables aren't long enough to fit into the connector well. Making a case with an indentation around the connector or putting it just a little bit closer to the edge of the PCB might fix this. I have no idea , though, what caused the flakey serial connection most of the time, though. -
Boot issues: I had multiple times when I tried to boot and got a message that it was trying to boot and I saw a "T" (or, if I remember, sometimes an asterisk) on the serial console, then a space, and, after a wait, another T (or asterisk). I don't know what was going on, but this took time and never was followed by a proper boot. At one point I was having serious trouble for a while getting it to boot and provide a non-garbled serial connection. I think the boot issue could be explained by a failing storage device, but I don't know if the serial connection issue could be.
I've spent a lot of time thinking this through and I hope it helps the devs at Netgate.
- With
-
@TangoOversway there is this?
https://docs.netgate.com/pfsense/en/latest/backup/restore-during-install.html#restore-configuration-from-media-during-installThere are various threads about eMMC storage. TL, DR:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html#emmc
Packages that require/recommend SSD:
https://www.netgate.com/supported-pfsense-plus-packages