Netgate SG-1100's stop 'routing'
-
Within the span of a couple of days, we received a notice from two separate customers, that their SG-1100 had stopped 'routing'.
That is, connected to the LAN side of the SG-1100, they now received an IP address from the network connected to the WAN side of the SG-1100 (In both cases the WAN is connected to the customers own LAN, not directly to an ISP).When looking into it, it seemed that both SG-1100 were 'dead'. Some LED's were working, but they could not be found on the network, the USB/serial port could not be reached, power cycling didn't help.
I suppose this (for me) somewhat unexpected behaviour could come from the fact that the SG-1100 uses an internal network switch which did seem to still work, only now as just a switch, no VLAN routing.
Both customers had used this setup for at least a year.
In both cases we replaced the SG-1100 with a router that was at hand (DrayTek in this case) and it al worked again.I did not look into this myself, and both units will be on my desk next week.
I'm not sure what firmware the devices were running, but know it's recent.Did anyone experienced something like this?
-
@NeverSimple I am running an 1100 now at a location and so far trouble free. Considering it has a builtin switch i can see some type of programming error or failure that bridged the WAN and LAN together and that's how you are pulling DHCP from the WAN...maybe..Not impossible thing to happen i suppose.
-
The internal switch in the 1100 is setup by uboot to separate the ports. It is then set by pfSense to however the pfSense config has it configured.
So the only way it could be permanently running with the ports in the same layer 2 segment is if the hardware fails sufficiently that uboot does not run and it is reset so the switch setup is defaulted.
If the console is not responding at all that could be what's happening. Some power component issue maybe? If the switch IC is still powered.
Steve
-
A little more information from a colleague who ended up looking at both units:
One of the devices indeed worked as a 'switch', but it did seem to run uboot, but stopped at 'Loading kernel' (see attached log file)
The other device (at this time) did not work as a switch, but refused to boot up fully, stopping at error: -sh: /etc/rc.initial: not found (see attached log)
Both devices were brought back to live with the Netgate installer (https://docs.netgate.com/pfsense/en/latest/solutions/sg-1100/reinstall-pfsense.html) so that probably rules out a permanent hardware failure.
The logs are from the time we looked at them in the office, which was roughly a week after they failed. There's no guarantee that what we found at that point was exactly the same what happened in the first place. But both routers didn't work at the time of checking
Since we don't know why both failed we are hesitant to put them back at the customers locations.
Does anyone can tell what could have happened by looking at the (limited) logs?OB20242109-Laude - Error- Boot.txt
OB20242101-Kranenburg Error Boot.txtRichard
-
I'm not sure about the "Laude" log, but the second log is something we've seen after upgrade failures to 23.09[.1]. I haven't seen the issue repeat itself after a clean install on ZFS. I'd say the second one at least is fairly safe to put back after a clean install of 24.03.
-
Both look like a filesystem issue of some sort which would be corrected by the reinstall.
However since uboot is completing on both the switch ports should be isolated.
Try interrupting uboot to reach the Marvell>> prompt then run
printenv
. Make sure there is an env namedswitch_disable
. If it's missing you can force update uboot from pfSense to add it. That should normally have been added during a uboot update.