Topton N100 Reporting 402 MHz

AnonymousRetard

@TheNarc Nice! From my understanding you are not forcing it run at at 9W all the time but rather allowing it to use 9W of power for an unlimited amount of time and as the maximum in "low load scenarios". When the CPU usage goes up the CPU will be allowed to use the PL2 power for a certain time limit which should be configurable as well. The PL2 limit is supposed to handle bursty loads since it usually takes a while before the temperature builds up enough to become a problem.

Sometimes if the CPU is constrained by thermals you will get better performance though by setting a lower power limit as the efficiency usually drops at higher power draws. As an example imagine the thermal downclock limit gets hit at a sustained 10W power draw and then the CPU downclocks because of the thermal limit. In that scenario you might get better performance if the CPU instead runs for long amounts of time at 6-9W if that causes it to not hit the thermal limit where it starts downclocking.

It shouldn't be dangerous to the CPU to set both of the limits really high, like putting both at 25W. It will limit itself because of thermals anyway but your performance is likely to drop at some point because of what I described above. Another problem though is that it could also increase the ambient temperature in the box too much which the other components might not like. I don't think the NVMe drive inside or the RAM has any heatsinks in our boxes but I haven't double-checked. But I'm sure neither is connected to the big heatsink on the top of the case. A third problem could be that the power supply isn't designed for too large power draws for longer amounts of time but in your unlocked BIOS it looks like the total power draw is also controlled and constrained by a separate setting (the 65W AC Brick setting). Normal PSUs though just shut down if they get overloaded (OCP).

stephenw10

Yes I'd expect that to be a limit. Under low loads I wouldn't expect to see much difference.

TheNarc

@AnonymousRetard Thanks for the explanations. That all makes sense, although the curious thing is that in the BIOS there's a time window setting for PL1, but not for PL2. One explanation of that PL1 time window setting I found though is:
"In BIOS settings, the PL1 time window refers to the time window over which the average CPU core power must be below the Turbo Boost Power Max."

I just left it at 0 because that's apparently the CPU default. But yeah I'll definitely need to keep an eye on temps. I bought a unit without RAM or storage because I already had an unused 8GB stick of DDR5 and just grabbed a 128GB nvme, but neither have heatsinks. This box is going to live on a wire shelf so I may investigate running with the bottom cover off and a fan aimed up at it from below. It is clearly going to need to have high reliability so that's the next test!

AnonymousRetard

@TheNarc Yeah it seems a bit strange that there's a PL1 time window, but it kind-of sounds like that one could be the time limit I'm talking about anyways. "Turbo Boost Power Max" is likely to be the PL2 limit, so it could be that this is the time after which it is forced back to the PL1 limit.. Or perhaps it's some kind of "cool down" time after which the CPU is once again allowed to go from the PL1 limit to the PL2 limit. Not sure at all. Don't really know what a value of 0 means either.

Good luck with your box and your stability testing! That's what I'm currently doing with mine, as I mentioned before it has unfortunately had two crashes so far and it's not yet clear why, but my biggest suspicion is either a bad SODIMM stick or that it gets too hot and then becomes unstable. This is why I'm now stability testing with this fan. I'd prefer not to have a fan though, perhaps I could also buy and attach some heatsinks to the RAM or just try as well with the bottom cover off and no fan.

Unfortunately RAM sticks don't have any logic to downclock when they get to hot like all CPUs do, so they tend to become unstable if they get too hot.

TheNarc

@AnonymousRetard Yeah that's a pain. Have you tried running memtest86 on it to try to eliminate just a bad stick? Of course that would only be valid if it doesn't crash before 24+ hours of memtesting :)

AnonymousRetard

@TheNarc No it's on my list of things to do, but since I already run my home network from the box I need to plan it a bit. I still have my old router so could use that in the meantime I guess but I will probably do it next weekend or something. Last crash (general protection fault) was 2.5 days ago now but I don't have enough data yet to be able to tell if the fan is helping or not. It does restart and come online again pretty quickly though so if the crashes happen seldomly enough I guess it's not the end of the world...

TheNarc

@AnonymousRetard Ah yup, well good luck to you! Would be nice if things "just worked" sometimes, but then I guess they often do if you pay 3-4X what these machines cost so I can't complain too loudly :)

Octopuss

@TheNarc said in Topton N100 Reporting 402 MHz:

@stephenw10 Well I loaded another modded BIOS from here that exposes power & performance options so now I can enable or disable SpeedStep and SpeedShift, C-states, change PL1 and PL2, etc. But nothing seems to meaningfully move the needle on this . . . incredibly frustrating. If anyone else who has been following this is running N100-based hardware, I'd be curious to know the results of:
openssl speed -elapsed -evp aes-256-cbc
for you. I can't help but wonder whether this is really highly specific to the hardware/BIOS combination I have, or if performance may be degraded generally for the N100 in FreeBSD, just not so much that it's generally noticeable when used for applications such as pfSense. That seems highly unlikely, but any points of comparison would be welcome. Thank you!

[2.7.2-RELEASE][admin@rozcestnik.lan]/root: openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing AES-256-CBC for 3s on 16 size blocks: 102276747 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 64 size blocks: 35529030 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 256 size blocks: 9200231 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 1024 size blocks: 2310933 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 8192 size blocks: 290606 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 16384 size blocks: 144929 AES-256-CBC's in 3.00s
version: 3.0.12
built on: reproducible build, date unspecified
options: bn(64,64)
compiler: clang
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x98c007bc239ca7eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-CBC 545475.98k 757952.64k 785086.38k 788798.46k 793548.12k 791505.58k

I have a CWWK unit with N100 CPU and two ethernet ports.

TheNarc

@Octopuss You're definitely seeing much better numbers on that benchmark than I did when I started. Although I was able to get a bit higher with the PL1 bump to 9W (at least I think that's what made the difference). But before I changed anything, my numbers on that test were around 100k across the board, so you're already 5-7X better than that.

Octopuss

@TheNarc I have this unit https://cwwk.net/products/cwwk-x86-p5-super-mini-router-12th-gen-intel-n100-ddr5-4800mhz-firewall-pc-2x-i226-v-2-5g-lan-fanless-mini-pc and Crucial 4800MHz memory. Even if your motherboard is a little bit different, the CPU is the same, so your situation is even weirder than mine.

TheNarc

@Octopuss Yeah if I had to guess, and that's all I can really do, my stock BIOS had the PL1 set to 6W and it did not expose that setting for me to be able to change. Now, as far as I can tell that should be fine, because the TDP for the N100 is 6W. Of course, Intel's definition of TDP is "Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload." So TDP is an average, and PL1 sets a ceiling, I think. But I'm fairly certain now that increasing my PL1 setting in the modded BIOS I loaded from 6W to 9W is what got me to my expected level of performance. To be certain, I'd need to do more testing, and I spent so many hours playing with the thing recently that I'm not inclined to right now :) But perhaps you got a BIOS that already has PL1 set to 9W, or something above 6W. At least, that's the best hypothesis I can assemble from this collection of speculation and unscientific troubleshooting that I've done :)

TheNarc

One remaining quirk that would be nice but not necessary to solve: any idea why I would have (seemingly accurate and changing) core temps reported on the dashboard, but nothing at all for Thermal Sensors under Status > Monitoring?

EDIT: Found this post and will try doing what was was described there and report back.

AnonymousRetard

@TheNarc I tried deleting the file from that thread but it just removed my sensor data completely. Was not recreated on reboot either. Instead I used the delete data button which deleted all data but also recreated the sensor data which now shows all sensors!

In other news I ran memtest86+ today and it passed on the first run but failed on the second. Temperatures very slowly climbed during the entire first run up to ~60-63C when I had no fan running during the test. I didn't see exactly what the temp was when the failure was recorded but it was around 60 degrees when I saw the error but there was also a recorded maximum of 73 degrees. I tried removing the back of the case and touched the RAM but it was too hot to touch, not possible to hold very long without burning myself.

I have now ordered a thin SODIMM RAM heatsink but I'm doubtful it will help all that much if the RAM and NVMe stick gets cooked in this little oven with no air exchange. Therefore I also ordered a 120mm adjustable speed USB fan which I could just place on top of the case. I would prefer to run it fanless though since that looks much nicer and requires less power. As a temporary measure I have increased my boards default for Temperature Activation Offset from 25 to 53 which I think should activate the Intel "Thermal Control Circuit" at TjMax - offset, which in this case should be TjMax = 105 for N100 and with my offset it should activate at 105 - 53 = 52 degrees. It seems to be working since I haven't really seen a temperature above ~53 degrees yet (Intel mentions some overshoot will happen). They also say though that a systems thermal constraint should really be controlled by the PL1 setting and not the TCC offset. Probably much more performance is lost by the actions the TCC has to take to limit the temperature compared to a power limit from PL1.

It's a bit sad though to have to limit the CPU so heavily because of the other components in the box because the CPU itself is fine with much higher temperatures than this. The much higher temperature offset I'm running now while waiting for the additional parts to arrive seem to have lowered by OpenSSL test performance by about 20-25% when running fanless. I tried with my old fan as well and performance was a bit better but still much worse than original with the fan as well but I don't remember the exact numbers. For now I won't do further testing (not even sure the limit I set makes the system fully stable), because it seems pointless to spend time testing a temporary solution. Probably the final solution will be SODIMM heatsink + the custom BIOS with a lower PL1 setting and a less restrictive TCC offset or a bit less restrictive PL1 and TCC offset with the fan (maybe just original settings).

TheNarc

@AnonymousRetard Yeah not sure what the deal with the RRD was but I saw the same thing. Deleted all the data, and then the thermal sensors came back. So far so good!

Sorry to hear about your RAM test results. Although, I'm not an expert, but I'd be a bit surprised if 63C was too hot. Even 73 doesn't seem outrageous. Not ideal, for sure, but I wonder if it's just a bad stick as opposed to a heat-induced failure. I guess you'd need to run a control under a more temperature-controlled environment to be certain. Best of luck with your heatsink approach. It is annoying that these little boxes don't make it easier to add a fan if you're so inclined. The totally passive cooling is attractive, but only when well-designed, which these are not so much haha. This one I deployed for my family is going to be in a basement so I'm hoping it will be alright without needing to add a fan at it, but time will tell!

AnonymousRetard

@TheNarc I know all components get a lot less stable when overclocked which is something I have done a lot with regular desktop systems. Both modern CPUs and GPUs automatically downclock to maintain stability when temps go higher because of this but RAM sticks don't. Now this RAM stick shouldn't be overclocked but there seems to be pretty much no settings regarding RAM in the BIOS, not even in your unlocked one. Another possible solution would perhaps be to downclock the RAM if it was possible and/or give it a bit of a lower voltage. But it really should be running at factory settings. I haven't really been able to find any specifications of what ambient temperatures it should be tolerating at stock settings though. But there definately won't be any room for overclocking the RAM at these ambient temperatures. Unfortunately, since there's no working ambient temperature sensor in the box and no temperature sensor I've been able to read from the RAM stick I don't fully know what either temperatures are but I feel like the ambient temperature in the box will probably settle not much lower than the CPU temps since the box is very small and badly ventilated and the big CPU heatsink will probably almost equalize with the air temperature inside the box...

All I know is the RAM stick after it had failed was so hot that it burned my fingers by just touching the backside of its PCB for ~2 seconds, which I don't think is very good. I wouldn't be surprised if the RAM without any heatsink becomes 20-40 degrees higher than the ambient temperatures around it when stressed, so if the CPU has made the ambient temperature close to 70 I wouldn't be surprised if the RAM starts approaching or even exceeding 100C... But yeah I'm making a lot of guesses here. I don't really know what normal ambient temperatures in a laptop are either, I guess SODIMM ram sticks should be expected to be installed in laptop systems.

I'll have to continue testing once I have more cooling solutions... If it turns out to be unstable even with much better temps I guess I'll just order another RAM stick instead and try with that. I'll post updates here once I have done more testing but it'll take a while before everything arrives.

AnonymousRetard

@AnonymousRetard An update: My system has now been stable for about 2 weeks with no crashes with a speed adjustable 120mm usb fan on top of the case running on the lowest setting. This is a very long uptime compared to what I could get before without cooling so it seems like the system can be made stable with a better cooling solution.

Today the RAM heatsinks arrived but they are just super-thin copper/graphene films and I don't have very high hopes that they'll do much. But I am now running memtest86+ again without any fan and with the heatsinks.

This test is now being run with a temperature activation offset of 37 instead of the 53 I mentioned earlier because with 53 and no fan I actually got another type of problem where my internet connection died after a while and never came back until I rebooted pfSense. There was some strange message in the kernel logs (dmesg) about some buffer or something being full but I don't fully remember what it said. But I googled it and there was some recommendation to increase the size of this buffer but I already had it quite large (above the default) so I didn't think it was appropriate to increase even more.

I think an offset of 53 should mean that the CPU will try to keep temperatures below 52C (105-53) by downclocking and eventually turning of clocks completely which might not really be doable without any active cooling and the extreme measures the CPU takes to try to stay below that temp might trigger some strange freebsd bug or something... (In my mind a full buffer should be a recoverable problem, not requiring a reboot to fix). The new setting of 37 should make the CPU try to stay below 68C which should be more reasonable with only passive cooling.

I will update the thread probably one last time when I feel like I have a final stable solution. Preferably one without a fan but if this test fails that's probably what I will end up with.

zoltar

Good afternoon all, this week I received my equipment. A CWWK-N100-4L-01 (LH-N100-4L-V2). I came to this thread searching because I have observed the same problem with the CPU, and in the current state it does not show more than 2GHz stressing it.
During the initial installation (the first baremetal) I had problems trying to mount two nvme in raid. The installation failed, and when trying to manage the disks with other utilities to eliminate the raid swap that I had created and zero the disk, the computer crashed because it was very hot. I gave up and removed the adapter that came with it to place the second M.2 in the port for the Wi-Fi card, and installed it on a single NVME with a cooler, I was afraid of burning it.
The S.M.A.R.T shows no problems and the system seems to be over 50º. The CPU in idle is around 40º. I expected a lower temperature since the system has some packages installed but has no load, and I was already scared when I read about burned memories trying to solve the problem and I stopped touching the BIOS since I don't have much knowledge.
I didn't initially enable PowerD in pfSense so that it used the more modern Speed Shift technology, but when I saw this I tried all the combinations...I enabled C-States, I disabled Speed Step, etc...I loaded the default values again and nothing has worked.
The device does not appear on their website, I cannot find the manual, I do not know which BIOS corresponds to it, the ftp is in Chinese...
Does anyone reading the post have the same equipment as me, or any clue as to what could be happening? I think they have to be the default values provided by the BIOS, but I don't know what they are. I bought it without disk or memory (my memory and disks are Crucial and it recognizes them without problems) but it would bother me if the BIOS values are not set by default so that the cpu works correctly.

TheNarc

@AnonymousRetard Thanks for the update and glad to hear it's going well for you. For reference, although I'm not sure how useful it will be to you, my machine has been stable with CPU temps generally being between about 40 and 45 with brief maximums up to about 53. Here's the graph over the last week:

I'm not as sure what to make of the nvme temp, because the SMART data reports "Temperature" as 45 but "Temperature Sensor 1" as 63, which is a rather large difference. But also it was a $15 drive, so I'm not inclined to worry about it too much now and figure I'll just see whether it remains stable (and if it does fail, after how long).

TheNarc

@zoltar I might advise running the same openssl speed -elapsed -evp aes-256-cbc command that I'd been using to test my N100 machine to see if it really seems to be getting "underclocked". I've posted my own results in this thread that you can compare to. I'm not getting at useful hits on the two model numbers you mentioned so I can't be sure whether we have the same hardware or not. Although mine was Topton branded, not CWWK, but I know a lot of these are just rebrands of pretty much identical hardware. I certainly wouldn't want to advise you to flash a BIOS without being 100% certain though.

As you can see from the other post I just made, my CPU temps tend toward the mid-40s most of the time. I think it's fair to say that 50 is not bad at all either for passive cooling. I wouldn't expect that to be causing crashes. Although there have been reports of varying mechanical quality of these machines with some exhibiting gaps between the surface of the processor dies and the heatsink. If you're comfortable doing so (and have replacement thermal compound) you might disassemble it and check for that, if you're convinced that it was crashing due to overheating.

zoltar

Yes, I have Artic paste, but the temperatures do not worry me now, what worries me is getting the processor unlocked and starting to rise when the device has a workload.
My output:
openssl speed -elapsed -evp aes-256-cbc.png