1.2RC3 and smp on Proliant DL380
-
pfSense is running on the hard drive of the VM and the DL380. I choose the SMP kernel option when I installed it in both cases. Any clue yet what is going on?
Thanks for your time!
Library Mark
-
Just a thought, Install a earlier verision if possible then up the upgrade option to RC-4 if possible and try that route. It worked for me on the SMP route.
RC
-
Sorry to hijack this a little, but a little caution on a DL380:
Even though I love the DL380 G5's and have a rack of them, I ran into a really weird problem on one of them this week. It was just set up as a 2003 Server with Smartermail Enterprise over the holidays and has run flawlessly since the OS was installed. It has dual quad cores, 4 gigs of ram, and 8 146 gb sas hd's and only handles email for about 300 users. The processor load was always less than 10% and typically under 5%. Disk queues were always less than 1. Bandwidth usage was usually about 1-2 mb/s and 500 pps each direction with short sustained periods of up to 6-8 mb/s, and up to 3k pps with 1.5k each direction.
The problem was, while near the peaks of it's usage, which wasn't anywhere near it's maximum, it would start dropping more and more packets, and then it would go completely unresponsive. I would go into the NOC and pull it up to see no problems other than it simply wasn't sending out any packets at all. Just receiving packets. No errors, nothing in any logs. I would have to reboot it to get it to work again. Sometimes twice. This would happen anywhere from 2-6 times a day. I switched to the other nic, switched the cables, changed it to a different switch, reduced the numbers of connections…..After beating my head against a wall since this started on Tuesday, I figured it out today. Every 25 minutes our metro e connection jumps from about 2 ms to 60 ms of latency for a couple minutes according to the rrd. When going through a latency spike, and email traffic spike at the same time, the tcp offloading would lock up the nic. I had to turn the tcp offloading off to fix the issue. The nics don't like high latency with the offloading turned on. Now I'm just waiting to find out what is causing the latency spike every 25 minutes. They're going to run some tests over the weekend and the Intel nic will be here Monday. The office I used to work at has the same spike every 25 minutes with the same provider but different ring. Just never caused a problem on the Supermicro with Intel nics I had there.
-
The DL380 I am trying pfsense on is a first-generation machine, if that makes any difference.
-
[Edited, after this father of a 4-week old baby got some sleep]
Hi. I am a former server hardware design engineer at Compaq, and I think I can shed a little light on this. I preface this by saying that I'm a hardware and BIOS expert, not a pfSense or BSD expert.
My thought is that the information the OS needs to understand that there is more than one CPU in the system, the MPTable, is not being generated because of the OS selection you have in the BIOS. Here's a little deeper information – you can skip below for the solution.
I think this is the case because of the files you attached to your post. Your bootup messages indicate that all devices are mapped to ISA interrupts, there is no reference to an APIC, and there is no info on the second CPU. (Note, I don't mean ACPI -- the APIC, Advanced Programmable Interrupt Controller, is something very different. This device that allows us to spread and prioritize interrupts among CPUs and have more than the legacy 15 interrupts, among other things). Without an MPTable and the APIC being programmed, the OS has no way of knowing that there is more than one CPU, and how to use both CPUs. (Also note, you should be able to run SMP mode with only one processor and get better interrupt granularity on any machine with an APIC -- if the hardware and BIOS designers did their jobs well.)
Here's the reason why Compaq does things differently. Way back before APICs and mainstream support of more than one CPU, Compaq developed proprietary ways of "gluing" CPUs together in a system. Many OS's had special drivers written to take advantage of Compaq's SMP hardware, in the days before there were standards. To support some of these legacy bits of code, Compaq's BIOS has to be prepared to tell various OS's how to handle the SMP hardware in a particular system. At power-on, the BIOS decides what information to pass to the OS, and how it's structured, based on the OS selection in the BIOS setup.
Here's the fix. My suggestion is to make sure you have the latest BIOS (Compaq did make changes to how the APIC is set up over the life of the P17 BIOS) and then adjust the OS entry in the BIOS to Linux (or whatever OS you might think is appropriate -- linux should work, because that should set up a fully-compatible MPTable), which should cause the APIC to be programmed up correctly and should let your OS, assuming it's the right SMP kernel, to see both CPUs.
Your system ROM family should be P17 -- that's the Compaq System ROM identifier you should see when you turn on your system. (It stands for the 17th different PCI-based server ROM that Compaq has made. As a quirk, not all machines make it to market, and you'll sometimes see holes in the numbering sequence. Old Compaq ROMs have E identifiers, i.e. E12 or E7, for EISA ROM.) Just make sure that you've got a P17 machine (when you boot) and load up the latest P17 ROM. It looks like the latest is dated 12/18/2002. You can get it at http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=254889&prodNameId=342841&swEnvOID=1025&swLang=8&mode=2&taskId=135&swItem=MTX-UNITY-I17424
Let us know if this works, so folks in the future will know!
-
cog_engr - wow - thanks!
I did try to upgrade the bios with what I thought was the latest from HP. It is very hard to navigate their website and know that you have what you need.
I guess need to run the config program again and make sure that I set up the bios right.
Thanks again for your insight!
LibraryMark
-
BTW- I had no idea that the BIOS cared what OS was on the system, and could not figure out why it asked. Now I do! Very interesting!
-
Today I booted up the Compaq System Configuration Utility and discovered I had the OS set to "other". I then set it to UNIX->Linux and rebooted. Still no second processor.
Then - I rebooted with a copy of 1.2 RC4 in the CD-ROM drive, choosing no ACPI - and the second processor answers up! Rock and Roll! I also disabled the floppy interface while I was at it - somewhere I read on this forum that this is a good thing to do on old Proliants.
I have disabled ACPI per http://devwiki.pfsense.org/BootOptions and other than loosing power control, all is well!
While I am thinking about it, instead of putting hint.acpi.0.disabled=1 in /boot/loader.conf, how do I make the second boot option the default (no acpi) instead of the first so I can choose ACPI if I want?
Thanks to all, and especially cog_engr for all the insight.
Later -
Library Mark
-
Hi Mark,
If you look to one of my topics of last year you will see that I had the same problem also.
The way you solved it is right, you have to turn of the floppy drive (because of booting) set the OS on Windows2000 (or other) and boot the thing up.
It's an odd problem that took me some time too.
-
Weird is that I can't get this fixed using:
hint.acpi.0.disabled=1
or
exec="unset acpi_load"
I have to check the bios settings again on a testmachine to be sure.
Still strange, I have the idea that I have to set another OS in the BIOS, this seems to help sometimes.