Watchguard XTM 5 Series
-
I am struggling to keep awake, so this will be a quick post in response to stephenw10 and 747Builder.
I spent q bit of time dissecting various BIOS flavours, including the XTM515. I reached a conclusion that it would be very difficult, if not impossible to implement a truly "universal" BIOS. This is primarily because most BIOS builds seem to include a baseline universal CPU ACPI information in the DSDT table, and then dynamically create the appropriate SSDT tables at boot. Those SSDT tables are created with code that resides in the "SLAB" module of the AMI BIOS - the Single Link Arch BIOS. The XTM515 BIOS is missing the required code so the easiest option is to create a separate BIOS image for each CPU model. This is effectively what I have done: the E3400 BIOS image has the required E3400 P-states programmed into the DSDT table, including the FID, VID, frequency and (estimated) power consumption. Similarly the Q9505S BIOS image has the required Q9505S P-states programmed into the DSDT table.
For each CPU model a new DSDT table has to be created and compiled, then merged into the baseline BIOS to replace the old DSDT table and to create a CPU specific BIOS image. I know that the Q9505S BIOS should work fine with Q9505, Q9505S, Q9550 and Q9550S CPUs, except the power will be way off for the non-S models. But the FID, VID and frequencies will be correct.
As an example, I am enclosing below a single CPU core DSDT excerpt for each of the above two CPUs. Please ignore the FID and VID numbers quoted in the comments as I have not fixed them yet to align with the actual FID and VID values in the code. This is left-over copy & paste-a-tis and my chronic lack of time!
CPU1 for E3400 with 4 P-states:
Processor (CPU1, 0x01, 0x00000810, 0x06) { Name (_PPC, 0x00) Name (_PCT, Package (0x02) { ResourceTemplate () { Register (FFixedHW, // PERF_CTL 0x10, // Bit Width 0x00, // Bit Offset 0x00000199, // Address ,) }, ResourceTemplate () { Register (FFixedHW, // PERF_STATUS 0x10, // Bit Width 0x00, // Bit Offset 0x00000198, // Address ,) } }) Name (_CST, Package (0x02) { 0x01, Package (0x04) { ResourceTemplate () { Register (FFixedHW, 0x01, // Bit Width 0x02, // Bit Offset 0x0000000000000000, // Address 0x01, // Access Size ) }, 1, // C State Type 2, // Transition latency in us 25000 // Power Consumption in mW } }) Name (_PSS, Package (0x04) // Values below for Intel Celeron E3400 { Package (0x06) { 2600, // f in MHz 65000, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00000D24, // value written to PERF_CTL; fid=13, vid=36 0x00000D24 // value of PERF_STATE after successful transition; fid=13, vid=36 }, Package (0x06) { 2000, // f in MHz 53800, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00000A1E, // value written to PERF_CTL; fid=13, vid=36 0x00000A1E // value of PERF_STATE after successful transition; fid=13, vid=36 }, Package (0x06) { 1600, // f in MHz 47500, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x0000081A, // value written to PERF_CTL; fid=13, vid=36 0x0000081A // value of PERF_STATE after successful transition; fid=13, vid=36 }, Package (0x06) { 1200, // f in MHz 42000, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00000616, // value written to PERF_CTL; fid=13, vid=36 0x00000616 // value of PERF_STATE after successful transition; fid=13, vid=36 } }) }
CPU1 for Q9505S with 6 P-states:
Processor (CPU1, 0x01, 0x00000810, 0x06) { Name (_PPC, 0x00) Name (_PCT, Package (0x02) { ResourceTemplate () { Register (FFixedHW, // PERF_CTL 0x10, // Bit Width 0x00, // Bit Offset 0x00000199, // Address ,) }, ResourceTemplate () { Register (FFixedHW, // PERF_STATUS 0x10, // Bit Width 0x00, // Bit Offset 0x00000198, // Address ,) } }) Name (_CST, Package (0x02) { 0x01, Package (0x04) { ResourceTemplate () { Register (FFixedHW, 0x01, // Bit Width 0x02, // Bit Offset 0x0000000000000000, // Address 0x01, // Access Size ) }, 1, // C State Type 2, // Transition latency in us 12000 // Power Consumption in mW } }) Name (_PSS, Package (0x06) // Values below for Intel Core 2 Quad Q9595S { Package (0x06) { 2833, // f in MHz 65000, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x0000481E, // value written to PERF_CTL; fid=48, vid=1E 0x0000481E // value of PERF_STATE after successful transition; fid=48, vid=1E }, Package (0x06) { 2666, // f in MHz 60100, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x0000081C, // value written to PERF_CTL; fid=9, vid=28 0x0000081C // value of PERF_STATE after successful transition; fid=9, vid=28 }, Package (0x06) { 2500, // f in MHz 56400, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x0000471B, // value written to PERF_CTL; fid=6, vid=22 0x0000471B // value of PERF_STATE after successful transition; fid=6, vid=22 }, Package (0x06) { 2333, // f in MHz 52000, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00000719, // value written to PERF_CTL; fid=9, vid=28 0x00000719 // value of PERF_STATE after successful transition; fid=9, vid=28 }, Package (0x06) { 2166, // f in MHz 48600, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00004618, // value written to PERF_CTL; fid=9, vid=28 0x00004618 // value of PERF_STATE after successful transition; fid=9, vid=28 }, Package (0x06) { 2000, // f in MHz 44700, // P in mW 10, // Transition latency in us 10, // Bus Master latency in us 0x00000616, // value written to PERF_CTL; fid=9, vid=28 0x00000616 // value of PERF_STATE after successful transition; fid=9, vid=28 } }) }
So in summary, the two BIOS images I created should be only installed on XTM units equipped with the specific CPUs!
Peter.
-
Ah that looks very familiar. Though I recall having to add a line for each CPU core with the other cores effectively just linking to the first core for the data.
Hard to imagine why that code was not included by default really. Hmm.
Steve
-
Ah that looks very familiar. Though I recall having to add a line for each CPU core with the other cores effectively just linking to the first core for the data.
Hard to imagine why that code was not included by default really. Hmm.
Steve
I don't really get it either, especially considering that I imagine the AMIBIOS8 development kit would come with all the required code modules included. Perhaps it was an attempt at squeezing out the last bit of performance from the hardware? I have seen some claims that under some workloads SpeedStep can cause on the order of 20% performance hit, but I find that very hard to believe. My own primitive checks show anywhere from 0.5% to perhaps 2.5% potential hit, but a lot of it could be just the measurement accuracy.
I did a few additional checks on the E3400 box with my BIOS mods today and SpeedStep is definitely working correctly. I can see frequency and core voltages changing and all of it is reflected in benchmark testing…
Peter.
-
I tried modifying my BIOS for a Q9650 but don't think I was successful.. While it didn't break anything it didn't seem stop the "not supported" errors in the log at startup.. :(
My Ucode was a bit older..
-
Here is the link to my BIOS. Again, this is for the final hardware version of the XTM5 that came with the E3400 CPU. Included are three BIOS images: without SpeedStep (for all CPUs), E3400 CPU and Q9505S CPU. The Q9505S CPU should also work with Q9550S CPU. I also included the corresponding compiled ACPI_AML modules and the ACPI_AML source where you can inspect my changes.
https://www.dropbox.com/s/aom4whlcg2rg6ic/XTM515-BIOS.zip?dl=0
I will be tweaking the Q9505S a bit more since I now have good power measurements from my box in all six P-states under full load.
Disclaimer: These work fine for me but USE AT YOUR OWN RISK!
Peter.
-
This post is deleted! -
Nice you caught something I missed there! :)
acpi_dsdt_load="YES" acpi_dsdt_name="/conf/e3400.aml"
dev.est.1.freq_settings: 2600/65000 2000/53800 1600/47500 1200/42000 dev.est.0.freq_settings: 2600/65000 2000/53800 1600/47500 1200/42000 dev.cpu.0.freq_levels: 2600/65000 2000/53800 1600/47500 1200/42000 dev.cpu.0.freq: 1200
Wrong values for my CPU but relatively easy fix!
Steve
Edit: Of course the 13x multiplier from the 3400 would be trying to drive an 8400 at 4.5GHz… which seems unlikely to succeed!
-
Does seem to actually work though:
[2.4.3-RELEASE][admin@xtm5.stevew.lan]/root: sysctl dev.cpu.0.freq=1200 dev.cpu.0.freq: 2600 -> 1200 [2.4.3-RELEASE][admin@xtm5.stevew.lan]/root: openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 26453752 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 7755366 aes-128-cbc's in 2.98s Doing aes-128-cbc for 3s on 256 size blocks: 2042954 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 518803 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 64976 aes-128-cbc's in 3.01s OpenSSL 1.0.2m-freebsd 2 Nov 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 141086.68k 166314.03k 174332.07k 177084.76k 176966.95k [2.4.3-RELEASE][admin@xtm5.stevew.lan]/root: sysctl dev.cpu.0.freq=2600 dev.cpu.0.freq: 1200 -> 2600 [2.4.3-RELEASE][admin@xtm5.stevew.lan]/root: openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 39744717 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 11696373 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 3068062 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 777909 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 97684 aes-128-cbc's in 3.01s OpenSSL 1.0.2m-freebsd 2 Nov 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 211971.82k 249522.62k 261807.96k 265526.27k 266049.61k
Actual result is 1.5x faster. I suggest that it is limited to a 9x multiplier on the 8400 so 6x at the "1200" setting and 9x at the "2600" setting. Actually 2GHz > 3GHz. Fun :)
Steve
-
Nice to hear! If you have a locked CPU then you are limited to what the minimum and maximum FID and VID values are for each given CPU. The Core Extreme CPUs as I understand are unlocked, so you could tweak the values freely.
I continued to tinker a bit. My latest changes implemented P-State dependencies _PSD, although that does not seem to make any obvious difference.
When I have a moment I will write down some quick notes 747Builder. Yes, I did update the microcode, but only for the CPUID that I am using. I don't see why one could not pick some other G41 based motherboard AMI BIOS that is recent and simply replace the whole CPU microcode module. Or you could do what I did and simply replace the microcode for the CPUID that you are using.
Peter.
-
Yes the max and min values for the E8400 (at least the one I have) are 9x and 6x so only 4 speeds. Unless it supports half speeds, I haven't tested.
Lot of warnings when you compile that. I think I went through it and fixed them last time around… too long ago! ::)
Steve
-
This post is deleted! -
Yes, seems to. But I've just seen a horrible typo! :-[
Edit: OK this looks better. Works OK here but YMMV. To be honest it doesn't do much from my testing. Maybe 1W less, at idle at least.
Steve
-
I have not attempted to fiddle with the rest of the code to correct the factory errors. I find the code very confusing, with my limited coding experience - I have done mostly C in the past. I tried the Intel reference manual but that was of not much help either. So if you recall how you fixed the errors it would be great to see a diff.
I have now confirmed that the BIOS (and the CPUs of course) support C1E state, so when idle the power is already as low as it can get even without SpeedStep. The benefit will be at partial loads, but I'm not sure how to quantify it. I looked at it this way - if I can spend a bit of time now and learn in the process it's worth it, even if it saves me only a few $ over the deployment time. At my rates here 1W a year is about $1 saved ::)
-
$1, it all counts! ;D
I was never really doing it for the power saving. That was the same conclusion I came to about C states. My own coding skills are nothing special, I think used trial and error last time.
Steve
-
This post is deleted! -
This post is deleted! -
Has anyone got 2.4.3 update working on XTM 5 yet? For me 2.4.2 update works fine. But if i try updating from 2.4.1 to 2.4.3 the unit stops booting. Any ideas?
-
@747builder: Yes. Loading the compiled .aml file at boot works as long as you're running a BIOS with Speedstep unlocked and enabled.
@diesel678: Yes, running 2.4.3 here. Had no issues upgrading.
Edit: Typo
-
I measured my youngest (manufactured in 2015) XTM5 box power consumption with the Q9505S CPU in all power states with all cores loaded using mprime. This unit is also equipped with 80+ PSU made by FSP, as opposed to my other two units that have Seventeam PSUs. The idle power consumption on this box is only 37W.
But in my measurement I also discovered why Lanner / WatchGuard might have disabled Speedstep. Basically, it looks like the box power consumption savings are smaller than the extra processing time required by the CPU caused by reduced frequency, resulting in net power consumption increase rather than decrease! See the attachment.A good practical case study would be to measure steady state power consumption of a XTM5 box in actual installation with both BIOS configurations. Like a unit doing all its routing / firewalling / UTM duties under controlled traffic…
Edit: I had a momentary lapse of reason and the numbers in the attachment are obviously incorrect, since they assume that when the box is idle it consumes 0W as opposed to 37W. I will post corrected numbers later tonight, but there is a notable overall power saving, so implementing SpeedStep remains worthwhile.
![XTM5 Speedstep Power.jpg_thumb](/public/imported_attachments/1/XTM5 Speedstep Power.jpg_thumb)
![XTM5 Speedstep Power.jpg](/public/imported_attachments/1/XTM5 Speedstep Power.jpg) -
Ok, fresh off the press, here are the corrected power consumption numbers and net energy use for XTM5 with Q9505S in each available processor power state. Note that I was able to load the CPU more consistently by selecting a different mprime torture workload and I repeated each test several times to eliminate measurement variability. Also note that I reformatted the table to make it easier to understand. The numbers look very good and I will definitely keep Speedstep enabled in my deployed box. Under partial loads the energy savings are substantial!
Peter.
Edit: Cleaned-up the attached image for additional clarity.
![XTM5 Speedstep Energy Usage.jpg_thumb](/public/imported_attachments/1/XTM5 Speedstep Energy Usage.jpg_thumb)
![XTM5 Speedstep Energy Usage.jpg](/public/imported_attachments/1/XTM5 Speedstep Energy Usage.jpg)