TROUBLE SHOOTING TEMPERATURE with AMD PROCESSOR

 

Introduction

A number of people are calling us with the same question: "I have such and such AMD processor with your cooler, and my temperatures seem kind of high compared to the reviews I read on your product. Can you help?"

Answering this question may seem like a simple technical support issue at first glance, but it is actually quite a complex proposition, and this is the reason for this article.

Most casual, and sometimes not so casual users, solely rely on the motherboard monitoring tools to record their CPU temperature. Unfortunately, these tools are not only inaccurate, but they are also inconsistent from one motherboard brand to another.  Even comparing two coolers using the same CPU frequency-voltage setup on the same motherboard, often results in erratic and highly inaccurate numbers.  

In the following paper, we shall try to explain some of the discrepancies that we have observed in our lab, and establish the different conditions that are necessary to conduct reasonably accurate testing of  socket A CPU coolers. Our intention is not to have our readers set up a professional lab, but simply to establish basic guidelines for quick troubleshooting purposes.

Technical Preamble

Later on in the article, we will often refer to our own test measurements, and compare them to the temperatures returned by motherboard monitoring tools. Here is a brief description of our testing procedure. The explanation provided below is meant to establish the validity of our measurements only, and not meant as a do-it-yourself recommendation:

CPU die temperature is measured by inserting a thermocouple inside the base of the heat sink. This is  done by drilling a .050" hole in the base, per AMD specification (23794-D march 2001), as shown here (see warning note below):

The thermocouple is connected to an electronic thermometer calibrated to 0.01°C. Ambient temperature is measured with a second thermocouple, positioned one foot above the fan intake. 

Warning note! We categorically recommend users against attempting to drill their heat sinks as shown above, all particularly in the case of copper base products. This necessitates sophisticated professional equipment, and cannot be done with hand or hobby tooling. 

 

1. Why are the motherboard temperature monitors inaccurate and inconsistent ?

Technical background

Because there is no internal thermal diode in AMD CPU's such as found in Intel® processors, motherboard manufacturers had to resort to external devices (thermal probes) to measure the temperature. The probe is usually located inside the socket, right underneath the CPU, and it measures the air temperature inside the socket. In an effort to report temperatures as close as possible to actual CPU die temperature, manufacturers use mathematic formulas to extrapolate the die temperature from the measured socket's air temperature. 

Issues

The  problem is that different manufacturers use different types/brands of probes,  locate their probes in different areas of the socket, and use different formulas to calculate the temps. This results in significant variations from one motherboard to another, even with the same CPU.  It is well documented for example, that Abit motherboards are on the optimistic side (reporting colder temps than actual), while Asus Motherboards are usually pessimistic (hotter than actual).   Another significant disadvantage to this method is that the temperature inside the socket is highly dependant upon the amount and direction of air flow outside of the socket area.  

  Papst Fan Liquid Cooling
Processor Tbird Tbird
Frequency 1400 1400
O/C 1600 1600
Voltage 1.85 1.85
Watts 92 92
Mobo IWILL KA26 IWILL KA26
Swiftech Model MCX370 MCW372
Thermocouple 39.9 32.3
Motherboard CPU Monitor 41 45
Ambient 22 22
C/W 0.19 0.11
Measured in BIOS BIOS

What we see above is an impossibility!  The motherboard monitor is reporting a higher CPU temperature with an MCW372 water block (45°C)  than with an MCX370 air cooled heat sink (41°C).  Conversely, the thermocouple is reporting a much cooler 32.3°C with the MCW372 than the MCX370's  39.9°C.    Without this reference provided by the thermocouple, one would easily think that the liquid cooling  setup is improperly installed or defective, whereas in reality the CPU is running significantly cooler than what is reported by the motherboard. 

What happened ?  

Since the water block doesn't generate any airflow around the socket, the air inside gets hotter, and the motherboard reports erroneous CPU temperature. Making things even worse, this particular board doesn't have a cooling fan installed on the Northbridge.  Verifying this theory was quite easy:  by installing a small 40mm on the Northbridge, we created sufficient airflow in the area, and the motherboard CPU monitor suddenly dropped in temperature from 45°C to 39C°C !  This is still an unacceptable 7°C difference compared to the actual CPU temperature, but at least this is getting closer to the truth !  Let us clarify that this particular motherboard has otherwise proven to be fairly accurate when used with our MC370 and MC462 class coolers.

This example clearly demonstrates the futility of comparing heat sinks with different airflow patterns by simply recording temps with the motherboard.

How can a user then get a good idea of how efficient his system is under such circumstances ?

For those users in search of the most accurate results, there is only one solution here: most modern motherboards now have headers to connect external temperature sensors. These sensors are called  thermistors, and they must be taped as close as possible to the CPU core. Warning! Such devices must be chosen very carefully: there is only 30/1000" clearance between the surface of the core -where a heat sink must make perfect contact, and the core packaging (the circuit board around the core), where the thermistor must be taped. If the thermistor is too thick, it will prevent appropriate contact between heat sink and CPU core, and  this will damage or destroy the CPU!

 

Is using a thermistor an absolute necessity ?

More casual users do not have to install a thermistor to monitor their CPU temps. There are  excellent utilities such as radiate, that will calculate the theoretical CPU temperature based on the manufacturer provided C/W rating, ambient temperature, CPU frequency and voltage, etc.. and with this data in hand, one can recalibrate their motherboard monitor to reflect more accurate temps. 

 

2.  The critical importance of ambient temperature

Why is ambient temperature critical?

Recording accurate ambient temperature is absolutely essential to any serious troubleshooting approach. All cooling systems are tightly coupled to the ambient conditions they operates into, and heat sinks are no exception. In other words, the CPU temperature will vary in direct proportion to the ambient conditions. For a regular heat sink-fan combination, the general rule is simple: +1°C variation in ambient will equal to +1°C variation in CPU temperature.  

How and where to measure ambient temperature?

Since your CPU operates inside a computer case, ambient temperature should always be measured inside the caseAutomotive dashboard-mount digital thermometers can be used for this purpose (about $10 at Radio Shack). They feature a digital display, convenient sticker or Velcro to attach the display anywhere outside the box, and a long thermal probe that can be fixed inside the case. Location of the probe is important . One should always try to locate the probe in the general area of the cooling fan intake. This is necessary  in order to measure the temperature of the air that is actually being used to cool the heat sink.

A word of caution!

From a user point of view, the most common oversight in trying to evaluate their cooler performance versus manufacturer-published data,  is failure to consider the following:  most manufacturers (Swiftech  included) publish ambient temps recorded under tightly controlled laboratory conditions. While guaranteeing a high degree of accuracy, such conditions translate into generally lower ambient temperatures than the average operating conditions found in common households.  Attempting to reproduce such conditions in your home PC is not realistic. On the other hand, if you record your ambient temperature appropriately, you can now use radiate and determine with certainty whether your high CPU temperature is due to high ambient temperature or to something else.

 

3. Recording CPU temperatures

a. Understanding CPU load and CPU idle

First, let us clarify two of the most common misconceptions in terms of CPU load: 

  • CPU is not idle in the BIOS

  • CPU is not idle under Windows 98, even when no applications are running

Your CPU can pretty much be compared to your car engine: as soon as your turn the ignition on, a thermal load is created. The comparison can be extended even further: thermal load increases as you go faster (open and run applications), or drive uphill (play games).  In the computer world, there are utilities that can force shut-down of  the "non-vital" CPU activities, the equivalent of placing your transmission in "Neutral" while going downhill: the car is still moving, but the engine is under no load. Such utilities are CPUIdle, HWMonitor, for Windows 98, and they are built-in for Windows NT/2000.

From a heat sink manufacturer point of view, measuring temperatures at idle can provide useful data, but from a user standpoint, such measurements are pretty useless. Conversely, measuring CPU temperatures at "full load" for extended periods of time is an interesting benchmark for extreme use purposes, but raises the following question:  how do you apply full load to the CPU ?  A number of hardware review sites use Seti@Home, Prime 95, Quake III and so forth to place a load on the CPU, but these loads are not necessarily consistent with each other. From a purely scientific point of view, the only accurate way to measure a cooling system performance is to use a calibrated heating device, a method that cannot be used by the average user.

b. Gathering CPU temperature data

For troubleshooting purposes we prefer to record CPU temperatures at the Bios level. This is the only measurement that is completely independent from the operating system, and that can be verified by any user.  This method provides and excellent "baseline" and is completely consistent in any computer configuration. It also paints a realistic picture of the average computer use, since computers are never always at full idle, nor always at full load. 

When recording CPU temperature data, one should always allow a minimum period of time to pass before noting the results. Depending of the cooling system,  temperatures take some time to stabilize. For example, a water cooling system may take up to an hour to reach equilibrium, whereas a stock heat sink will normally reach this point within 15 minutes. We recommend waiting approximately half an hour for conventional air cooling system (heat sink/fan) and one hour for liquid cooled systems.

 

4. Step-by-Step approach to temperature troubleshooting

  • Record your ambient temperature as outlined in paragraph 2

  • Use radiate (click on the link to download). 

To have a valid point of comparison, you need to establish what the CPU temperature should be in theory. Radiate is the most convenient consumer tool available to date for this purpose. Using sliders, enter your CPU core frequency, voltage, ambient temperature, and manufacturer-provided C/W rating, or pre-defined heat sink/fan combination. The software then returns what the CPU temperature should be.

Here is a screen shot:

  • With this data in hand, you can now go into your bios, and compare it with your motherboard monitor. We think that it can be safely said that any difference superior to 10°C reflects a problem of some kind. The question is then, what is the problem ?

 

5. Quick approach to problem identification

To the exception of liquid cooling systems -which we saw in paragraph 1 present a real challenge to current motherboards, most heat sinks abnormal temperature readings can be grouped into two categories:

  • incorrect data to begin with, or

  • incorrect assembly

a. Incorrect data

In their process of gathering data,  we have noticed that a number of users disregard or misjudge the importance of the fan they use with their heat sink. We cannot stress enough the fact that heat sink performance, in other word the manufacturer-provided C/W,  is absolutely dependant upon the type of fan used. One must make absolutely sure that the heat sink is being tested with a specific fan, operating at its recommended voltage. In this respect, radiate is extremely flexible: the heat sink database (coolers.dat) is a text file, and can be easily edited to add particular heat sink models with different types of fans. You can obtain Swiftech's up-to-date coolers.dat file by sending us an e-mail.

b. Incorrect assembly

  • The single most common cause for poor performance is incorrect contact between the CPU die surface and the heat sink. There is only one method to verify whether the assembly is correct or not, it is by inspecting the imprint left by the thermal grease on the base of the heat sink. Here is what a perfect imprint should look like this:

Click on picture to enlarge

Good!
The imprint is faint and even, showing that the grease was properly squished out by the perfectly flat assembly.
Bad!
The imprint is uneven, showing some areas with practically no grease, and others with a thicker coat.

 

  • If the imprint is uneven or partial, you can pretty much conclude that there is something wrong with your assembly. One word of caution though: we have seen users literally "plaster" their CPU with thermal compound. If the coat of thermal compound is too thick, you just cannot correctly estimate the imprint left on the heat sink. Thermal grease should be applied with a razor blade, held at a 45° angle between thumb and index, and only an amount sufficient to create a paper-thin coat is necessary.

Jul 7, 2001