2.1-RC1 issues with modern hardware?
-
So, I seem to have narrowed this down to an issue with
cd9660 and/or the liveCD customizations and 6-series and 7-series systemsthreading, of all things. At least, that's the only thing I can figure it is, after the testing I've done. To summarize:-
Bootup issue consistently causes hangs on 2.1-RC on the second "configuring firewall…" but -only- on 6-series and 7-series systems that I've tested. Older systems are fine (I've tested on 5-series and on Atoms). After a suggestion by another forum member, I have since acquired and tested a Realtek 8111-based pcie card as a further sanity check, and have verified this effect occurs using that card instead of the Intel variants (82579V, Gigabit CT, Pro1000/PT) I had been using
-
A previous poster suggested this was an ATA/NIC irq conflict but that does not seem to be the case, as leaving all ATA devices in place (hard drive, CD) and booting from memstick allows successful boot. Conversely, disabling ATA entirely and booting from a usb-based cd drive produces the same hang at the exact same spot. Also, booting from hard drive once an install is done via memstick allows successful, reliable boots at normal functionality.
-
There is a similar issue that manifests with 2.0.3, but not as consistently, again involving the livecd, so this doesn't seem to be a super-new issue.
-
FreeBSD 8.3-RELEASE-dvd1 boots without any issues, and will allow a successful install to hard disk, whether from dvd media install or ftp/network install.
Using system bios to enable/disable cores/threads gives the following results:
-
Four cores, 8 threads = hang
-
Four cores, 4 threads = hang
-
Two cores, 4 threads = boot ok multiple times
-
Two cores, 2 threads = boot ok multiple times
-
Three cores, 3 threads = hang
I'm not sure where the above leaves us, either than I can only think it's some quirk of the liveCD customization and/or the liveCD itself that causes these issues, as it's only the LiveCD that produces the problem, regardless of what interface it's using to connect, but only (best as I can tell) if more than two cores are present. As an upside, using the memstick boot to get around this does work consistently, so perhaps that'll need to be the standard install method on newer systems, though it'd be useful to convey that to folks if that's the case.
This is about the limit of what I can test here, at this point, unless one of the devs gets involved…
-
-
I too have this problem. I have a MSI P55-GD80 w/ Intel i7 860. 16GB / onboard NIC using a Sata CD using pfSense-LiveCD-2.1-RC2-amd64-20130906-2049.iso
Following the same information in this thread, I can repeat the same failure and success.
All cores on, HT on: Hangs at the second Configuring Firewall, right after Starting NTPD.
Turn off HT and limit to 2 cores, and I have a system that boots off cdrom each and every time.
-
I had similar hanging problems and the reason was BIOS option called "large disk access mode" (or something like this as I remember), which had two options "DOS" and "Other":
-
when it's set to "DOS" 2.0.3, works, but after upgrade to 2.1, the server hangs during boot
-
when it's set to "Other" 2.0.3, hangs few steps after setup begins, 2.1 works OK
-
-
I had similar hanging problems and the reason was BIOS option called "large disk access mode" (or something like this as I remember), which had two options "DOS" and "Other":
-
when it's set to "DOS" 2.0.3, works, but after upgrade to 2.1, the server hangs during boot
-
when it's set to "Other" 2.0.3, hangs few steps after setup begins, 2.1 works OK
Except in this case, the ATA subsystem can be disabled entirely, and this issue will still occur at the "Configuring firewall.." step. Interesting to note about the large disk option, but I'm not sure a lot of the newer bios variants even offer this anymore. Also, booting from a memstick to do the install will bypass this issue entirely, and the issue also does not occur once pfsense (of either 2.0.x or 2.1) is actually installed to disk. Again, interesting to note, but I'm pretty sure that's a separate issue entirely to the one in question (as the issue in question seems to be a quirk of livecd and the system core count).
-
-
It's probably worth noting that Lightningfire and I spent quite a bit of time working out this issue off forum.
Starting with why one of my 7 Series systems worked just fine, and his didn't, and working on figuring out why my old i7 wouldn't work, but a machine of his that was similar would work. We were quite stumped until we joked that it had something to do with too many cores. Which, when faced with no other similarities… We decided to test.
My newer 7 series was an i3, his an i7. His old rig a dual core, while mine was an i7.
We spent a few hours looking at different permutations of what and what did not work.It's also worth noting that Lightning opened a bug report on redmine, which is probably quite clear.
http://redmine.pfsense.com/issues/3187 -
Gives a 2.1RC2 a try.
Been using since day 5, working fine so far.
-
We were.
Do you have your pfsense box running an i7? Was your wan interface connected during the install? Did you install from a cdrom, or another means?If all these things are true, which chipset are you using?
-
HI all,
@lightningfire: Your post with the multicores and -threads rang a bell with me: We had some issues with our new server grade hardware (IBM servers), too. Hanging at some point during the boot procedure. With luck I was capturing a error message in the dmesg earlier, indicating that the problem was with the mbuffers of the intel NICs. Those get really screwed up when having new CPUs with multiple cores AND multithreading. So the system sees like 8 or 16 cores and assigns buffers per core.
See: https://redmine.pfsense.org/issues/1221
What fixed the issue (or worked around) for us was to define the mentioned 3 configuration parameters in the boot.loader.conf. Afterwards the system booted just fine. CD boot was also a problem but when intercepting the boot and defining those variables before booting manually, CD boot worked, too. Perhaps it's worth a try?
Greets
Jens -
HI all,
@lightningfire: Your post with the multicores and -threads rang a bell with me: We had some issues with our new server grade hardware (IBM servers), too. Hanging at some point during the boot procedure. With luck I was capturing a error message in the dmesg earlier, indicating that the problem was with the mbuffers of the intel NICs. Those get really screwed up when having new CPUs with multiple cores AND multithreading. So the system sees like 8 or 16 cores and assigns buffers per core.
See: https://redmine.pfsense.org/issues/1221
What fixed the issue (or worked around) for us was to define the mentioned 3 configuration parameters in the boot.loader.conf. Afterwards the system booted just fine. CD boot was also a problem but when intercepting the boot and defining those variables before booting manually, CD boot worked, too. Perhaps it's worth a try?
Greets
JensI don't think it's the same problem, but it's interesting to note. For one, Hyperthreading doesn't seem to affect this problem…only core count does. So 2 cores 4 threads will boot fine, but 3 cores 3 threads, or 4 cores 4 threads will not. Also, I've replicated the issue using a Realtek 8111 card as well; while I'm not exactly fond of Realtek, to see the issue occur in exactly the same way at exactly the same spot, and only on the livecd and only resolvable by changing core count, would seem to say that this is a different problem.
I'm also pretty sure I've not seen any of the "cannot receive structures" issue. The odd thing, if anything, is that the interface (well, both interfaces if I've got multiple nics enabled) seems to be working at that point (I didn't take full notes for -that- part of the procedure, but from memory I'm pretty sure I had my laptop connected to the LAN side and it was getting packets ok)….it's just, on the livecd the bootup never finishes, so the webconfigurator is never accessible, etc. It's interesting to note, though, for sure.
-
Ah that's good to hear - somehow anyways :-
But it's interesting and a bit disturbing to see another issue arise which is more or less depending on multi-cores and/or threads. Hopefully that won't be the start of a trend.Greets