technology from back to front

Thermostat defeat

When I started on this, I thought I’d be able to dash off a script to keep my CPU fan quiet in a few hours. I’ve just spent far too much of this weekend obsessively hacking on it and testing it, and after creating a tool of great sophistication, I have basically given up in defeat. I’m now using a thermostat-like approach; either the fans are on full or on minimum, nothing in between.

First installment of the saga

Why didn’t I just do that in the first place? Because it’s quite distracting when the fan suddenly gets louder or quieter, so if your system is regularly switching between the two to keep itself at the right temperature, it could be more irritating than having the fan on full blast all the time. It’s much more desirable to find just the right constant speed to maintain temperature, and stay there.

So I wrote a proper feedback loop which measures the temperature and makes constant small adjustments to the fan speed to keep it in range. I even wrote a sophisticated and effective scheme for mapping fan power to fan speed, which learned what fan power to use to reach a specific speed, which learned where the fan stopped and avoided it, and learned how to get it restarted if it stopped. But I’ve abandoned that now.

The first problem with this approach is that when you change the fan speed, it takes too long to find out what the result will be. As I explained in the previous article, for a given workload and fan speed the system will eventually find an equilibrium temperature. But it turns out that it can take several minutes to get close to that equilibrium temperature. Only when you’ve reached equilibrium can you really know what adjustment you should be making to get the fan speed right next. If you try to make the adjustments early, you’ll get exactly the loop I thought wouldn’t be a problem in the previous article: the fan speed will zoom from too high to too low and back.

If the fan temperature is too high, you can’t afford to wait minutes; the CPU could be damaged. So to do this safely, you have to start with the fan at maximum, then very slowly bring it down until the CPU temperature reaches the target. If the temperature goes above a safe threshold at any point, you must then turn the fans on to maximum and start all over again. If you manage to avoid this fate, you should find the correct fan speed within about quarter of an hour.

Unless, of course, the CPU workload changes during that time. This brings us to the second problem with this approach: CPU workloads generally change much faster than that. Long before you’ve adjusted for the current workload, the system will have moved to another.

Actually, that’s not quite true; on my system at least there are two circumstances where the same workload is sustained for long periods of time. That is when it is basically idle, and when it is working flat out. When it is basically idle, even the slowest fan speed is enough to keep the CPU at well within specified temperature; in fact, with CPU frequency scaling it never goes above 41 C. For these purposes decoding video (ie watching TV) seems to count as “basically idle”. When it is working flat out, the fan needs to work flat out to keep it at 61 C, which is as high as I’m really happy to go.

Thus even if the script could magically determine the exact right fan speed for the work the system is doing and skip to it, it would still usually be shifting straight from ticking over to flat out as the workload changed. This is the third problem; at least on my system workloads calling for a speed which is neither minimum nor maximum seem never to happen at all.

Once I’d painfully determined all this, it was clear that my amazingly sophisticated script could be replaced by a simple thermostat with no real loss in function. If the temperature goes above the high threshold, the fan kicks into full power. Once it drops below the low threshold, it spins down to its minimum speed. It turns out that this desperately simple scheme will do the right thing in all the circumstances I’ve observed so far.

I also got rid of the wrapper/script structure I used to have. The thermostat is barely more complex than the wrapper – simpler once you take away the job of managing a child process – so I did away with the Python script and wrote the whole thing in C, which also means you don’t have to install Python to use it. It’s currently under 250 lines of code. After a little bit of cleaning up, this is what I will submit to lm-sensors.

If I ever observe dithering – frequent spinning up and slowing down – on any system, I can start to think about how to combat it. I currently think that if the more sophisticated approach is to be made to work, it must directly observe the CPU load – and, for systems with CPU frequency scaling, the current CPU frequency – in order to know how to respond to it. It will have to keep a record of how the temperature changes under different loads, at different CPU frequencies, with different fan speeds, at different CPU temperatures and different case temperatures, and from each datapoint infer a constantly-updated model of what the right fan speed for the current conditions is. It will have to cope with the imprecisions of all the measurements – one big problem is that at least on my system temperature is measured in whole degrees, giving us only the coarsest-grained information on how it is changing – to ever refine its picture of how to give the best results. And it must do so reliably, without endangering the CPU with dangerously high temperatures and while minimizing the speed of CPU temperature changes to prolong its life.

I might come back to it one day if the need ever arises. For now my 250-line thermostat will do me fine.

by
Paul Crowley
on
26/06/06
  1. Good stuff, I was about to head down this path and make my own PID controller based on current CPU load and temp…. but now i’m having second thoughts.

  2. Amitava
    on 22/10/07 at 7:00 pm

    Nice useful discussions, will help the data center people who are breaking their head on similar stuff

 
 


one + = 7

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us