Panic booting 2.6.0 on Jetway NF692G6-420
-
An MCA error is almost always a hardware problem so if you're seeing it in 2.6 consistently and not at all in 2.5.2 it's probably some device that's not enabled in 2.5.2.
Steve
-
$ cat mce.log MCA: Bank 0, Status 0xb200000010000400 MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000004 MCA: Vendor "GenuineIntel", ID 0x506c9, APIC ID 4 MCA: CPU 2 UNCOR PCC internal timer error
$ mcelog --no-dmi --ascii --file mce.log mcelog: Family 6 Model 92 CPU: only decoding architectural errors mcelog: Family 6 Model 92 CPU: only decoding architectural errors Hardware event. This is not a software error. CPU 2 BANK 0 MCG status:MCIP STATUS b200000010000400 MCGSTATUS 4 MCGCAP c07 APICID 4 SOCKETID 0 CPUID Vendor Intel Family 6 Model 92 Step 9
tl;dr: Hardware issue.
Might be something in the EFI/BIOS, an EFI or BIOS update might help, maybe switching between EFI and legacy booting, but that's just a guess.
Jetway is not known for quality hardware, though, so it's also possible it's an actual hardware problem with that CPU.
-
@jimp Actually, I cannot complain about the Jetway hardware I have in use (not a lot lot, but double digits). This is the first time I have any significant issue with any of it.
Even if the actual error is caused by something in the hardware, because 2.5.2 runs perfectly fine the suggestion upthread that it may be exposed by a kernel change makes sense to me.
It looks like there is a BIOS update available, I will try that soon.
I have not booted into the 2.6.0 installer at all yet; if that works, perhaps it will give me a clue. Most likely I will end up bisecting the kernel, which will be so much fun ...
-
On occasion a newer base OS will utilize some new feature of the hardware and uncover a latent problem as well. So even if it is related to the newer base OS that doesn't necessarily rule out a hardware problem, though it may be a specific hardware device or function that wasn't touched in the old version.
-
There is more wrong here than just a "simple" hardware issue that is exposed by pfSense 2.6.
- FreeBSD 12.3 and 13.0 install fine as well, and I can bring up the network and do some basic testing without any indication of trouble.
- pfSense 2.6.0 installs (from scratch) without problems, and reboots and runs without the network connected, but when I plug in the WAN link it freezes after no more than ten seconds.
- Same for the SYNC link while pfsync is enabled at the other end.
However, I also cannot get it to send or receive anything on the LAN interface(s). The original configuration was with an LACP LAGG over igb0/igb1. I reduced it to a static LAGG, then to a single interface, and consistently only saw outgoing traffic on both the firewall and the switch respectively. Neither side received anything from the other, and I tried every combination and several cables, of course.
I am now back on 2.5.2 with the original configuration, and everything is working just as before.
A second NF692G6-420 behaves the same insofar as it panics on the first reboot after installation. Experimenting any further seems pointless.
Conclusion: If I want to use any pfSense after 2.5.2, I need different hardware. How nice.
-
@chrullrich said in Panic booting 2.6.0 on Jetway NF692G6-420:
There is more wrong here than just a "simple" hardware issue that is exposed by pfSense 2.6.
- FreeBSD 12.3 and 13.0 install fine as well, and I can bring up the network and do some basic testing without any indication of trouble.
Just curious -- which FreeBSD? Did you try STABLE or RELEASE? pfSense is now using the STABLE branch, and it is different than the same version number in RELEASE. pfSense 2.5.2 was FreeBSD 12.2 STABLE. The 2.6.0 pfSense is based on FreeBSD 12.3 STABLE.
So a fair test would need to be done on the STABLE branch for FreeBSD. Just mentioning this because some folks grab RELEASE and don't realize that STABLE can be quite different when it comes to drivers (and bugs).
So with all that said, it is true that pfSense runs on a "customized" FreeBSD, so there are some changes. If you see different behavior between FreeBSD 12.3 STABLE and pfSense, then it might point to a pfSense issue (or still might be the particular patch level between the 12.3 STABLE you test on versus what pfSense 2.6.0 is built on).
-
Does it make any difference which NIC you have assigned as WAN?
Do all 6 NICs use the igb(4) driver?
Steve
-
OK, I think I figured it out, and this is embarrassing. Short version: The ACPI OS selection was on Windows, and it works much better when set to Linux, although I'm not completely sure that fixed the panics. It fixed something, though.
Long version:
The BIOS on the NF692G6 has the usual ACPI OS selection, which (of course) defaults to Windows. The other options available are Linux and MSDOS, and since FreeBSD is neither Linux nor MSDOS, I figured I might as well leave it at the default. Big mistake.
I set up a test lab with a single WAN instead of two and a single LAN instead of ~10. From the start, I saw an entirely different problem than before: Rather than panicing or just freezing once they received CARP or pfSync traffic, each individual network interface stopped working when it saw the first TCP packet (or possibly anything but ICMP). I could literally ping forever without trouble, but as soon I tried to get to the web configurator, the ping responses immediately stopped (and the browser timed out). As usual, this did not reproduce on vanilla FreeBSD (12.3, 13-RELEASE, 14-CURRENT). OPNsense 22.1 (with 13.0-RELEASE) did the same, and 21.7 (12.1) did not.
Then I noticed the OS option again, set it to Linux, and the new problem went away like the wind. If only I had not already replaced the hardware with nicer (read: much pricier) things. -
@stephenw10 To answer your questions: No, it makes no difference which is the WAN, and yes, the six interfaces on this board are igb0 through igb5.
-
Hmm, that sounds like it could be some hardware off loading in the NIC that isn't implemented as it's reporting. You might try comparing the output between 2.5.2 and 2.6.0 of:
ifconfig -vvvm igb0
TCP Segmentation Offloading should be disabled by default.
It's possible the BIOS reports different capabilities there to Windows. That's not something I've seen on any other hardware though.
Steve
-
@stephenw10 Looks identical to me. This is with the two versions on the two NF692G6s, and the one with 2.6.0 had link. I didn't notice until I started comparing them, by which time it was too late.
pfSense 2.6.0, ACPI OS = "Intel Linux":
[2.6.0-RELEASE][root@pfSense.home.arpa]/root: ifconfig -vvvm igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:30:18:09:13:75 inet6 fe80::230:18ff:fe09:1375%igb0 prefixlen 64 scopeid 0x1 inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active supported media: media autoselect media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
pfSense 2.6.0, ACPI OS = "Windows":
[2.6.0-RELEASE][root@pfSense.home.arpa]/root: ifconfig -vvvm igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:30:18:09:13:75 inet6 fe80::230:18ff:fe09:1375%igb0 prefixlen 64 scopeid 0x1 inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active supported media: media autoselect media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
pfSense 2.5.2, ACPI OS = "Intel Linux":
[2.5.2-RELEASE][root@pfSense.home.arpa]/root: ifconfig -vvvm igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:30:18:09:12:df inet6 fe80::230:18ff:fe09:12df%igb0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect status: no carrier supported media: media autoselect media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
pfSense 2.5.2, ACPI OS = "Windows", after it spontaneously rebooted once at "Configuring LAN interface...", with no additional output on the serial console, on the first boot after changing the OS option:
[2.5.2-RELEASE][root@pfSense.home.arpa]/root: ifconfig -vvvm igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:30:18:09:12:df inet6 fe80::230:18ff:fe09:12df%igb0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect status: no carrier supported media: media autoselect media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
-
Mmm, I agree looks to be configured the same in all cases.