Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

Heat – The Death Knell for Hard Drives

Written by  Tuesday, 06 November 2007 11:10
In our digital age, nearly every facet of day-to-day life is affected by the computer. Whether at the bank or the grocery store, we are constantly reminded of the important role computers play in our daily tasks. American author/essayist Robert Fulghum wrote, "If you break your neck, if you have nothing to eat, if your house is on fire, then you got a problem. Everything else is inconvenience."

In our digital age, nearly every facet of day-to-day life is affected by the computer. Whether at the bank or the grocery store, we are constantly reminded of the important role computers play in our daily tasks. American author/essayist Robert Fulghum wrote, "If you break your neck, if you have nothing to eat, if your house is on fire, then you got a problem. Everything else is inconvenience."

Similarly, when your computer stops operating due to overheating you’ve got a problem. Whether the affected computer contains student homework, small business accounts, first child’s baby pictures, or company files representing hundreds of workforce production hours, computer failure greatly impacts our lives. With this proclivity to store increasing amounts of various types of information on our computers, a computer crash becomes a crisis. Everything else is a second priority. Therefore it is crucial to asses the risks to your valuable data in the event your hard drive becomes overheated.

 

Heat – It’s a Killer

All electronic components use conductors, materials designed to easily transmit electrical current, to create a circuit path. Electronic conductors are usually made of metal, such as copper, gold, or aluminum. All materials offer some resistance to the transmitted electrical current, generating heat. Due to each material offering a different resistance, knowing the conductor’s properties of how it carries the electrical current is very important in component design. Electronic engineers have specifications of different materials’ resistance ratings. During the designing stage of a project, these resistance values are referred to many times.

So how does a material’s resistance rating translate to heat generation? A material’s resistance generates heat because of the friction caused by the electrical current’s electrons passing through the conductor’s electrons. This thermal resistance increases as the conductor heats up, thereby limiting the amount of electrical current that is passing through. Increasing the current to get the power needed will compensate for this decrease in current efficiency but at a cost — more heat. Without proper planning, thermal resistance can quickly get out of control.

In an electronic circuit, this can have a "knock-on" effect. If one component or conductor heats up, this can affect other components and pathways. Solder joints can begin to lose conductivity, creating additional resistance, increasing thermal resistance at different points throughout the circuit path.

Can fluctuating electrical thermal resistance corrupt the data on your computer? Experienced data professionals say yes.

Power fluctuations can be illustrated this way. Have you ever been in a house where the lights dimmed because of a large draw of electricity from another source? Perhaps there is a high-current draw from an electrical heater or other appliance. For a brief moment, the rest of the appliances are not getting the full amount of electricity, and you can see the effects with your own eyes. On a smaller scale that is what is happening inside a computer when varying electrical thermal resistance happens. Only in a computer the electrical current is transporting your data or energizing other components to handle your data. When that electrical current fluctuates due to thermal resistance, your data is at risk of being negatively affected. The data may be corrupted; your computer may respond slowly, or even worse – a crash.

Here’s an example of how just minor current changes could affect your data. Suppose you are typing into your computer:

"The quick brown fox jumped over the lazy dog."

When you hit enter, a small thermal transference inside the computer interrupts the circuit resistance, and in a blink of the eye the computer records:

"TӘұ -₪i‡k brﻻØn ¿æx ÿï3å˘9 ∂₣‼ t2e lPgy u7g."

The number of characters may be the same, but the characters are completely jumbled up. The electrical current and frequencies are very precise within a computer, and if that current changes because heat is causing a conductor to fail, the data will be corrupted. Heat-related problems tend to be sporadic; multiple factors trigger overheating – sometimes it happens when the computer is under a heavy workload or if there is another event that is causing a thermal overload.

This concept of electrical current changing is even more dramatic with hard drives. Hard drives have cache memory, flash ROM, and a drive controller, as well as a main controller that manages the entire hard drive, including how data is read and written. The data from the computer has to be converted, amplified, and stored in a unique way on the hard drive’s platters. Hard drives have numerous checkpoints to verify the data has been received and written with integrity. However, if the data is compromised before it passes the checkpoints, it may be compromised when written.

Inside the hard drive, heat is generated from the following areas: the spindle motor, the voice coil motor, and the electronic circuit board. The spindle motor of today’s hard drives spins the drive platters at a standard 7,200 rotations per minute (RPM) – just for comparison, the average automobile redlines at between 5,600 and 6,500 RPM. There are also 10,000 RPM, and 15,000 RPM speeds for hard drives. While these speeds used to be only available for high-performance server hard drives (such as SCSI, and iSCSI drives), there are now consumer hard drives available with spindle speeds of 10,000 RPM. To handle such high speeds, drive manufacturers use special bearings within the spindle motor to minimize heat. Even with this precautionary measure, spindle motor rotation is a major source of heat for the hard drive.

The next source of heat comes from the voice coil motor (VCM). This is the device that moves the heads of the hard drive back and forth above the platters at a dizzying speed. It takes a burst of electrical current to the VCM to get the heads to right area where the data is, and it takes just as much current to slow and stop the heads over that exact data area. The heads of an active hard drive are constantly in motion, starting and stopping over precise areas to read or write data. You can imagine what is involved here by imagining a car with no brakes — one with only "forward" and "reverse." To get a car like that to stop in an exact spot, you would have to go from "forward" to "reverse" very quickly to stop the movement of the vehicle. Shifting back and forth like that would cause a lot of friction and heat. The VCM is doing essentially the same action every time a read or write request is made. When there is a lot of hard disk activity, the noise you hear from your computer is the VCM doing this incredible work.

The final source of heat from the hard drive is the electronic circuit board that manages the hard disk operation. It takes a total of 17 volts of direct current (VDC) to operate the entire hard disk. A portion of this current is operating the electronics. Like all electronic circuits, it is giving off heat. Because the electronic circuit board is close to the hard disk assembly, or chassis, it is common for the heat generated to transfer throughout the entire hard disk assembly.

 

Hard Drives and Heat Don’t Mix

Most computers that are 3-5 years old have an onboard temperature sensor (thermocouple) that can be accessed, either from the computer BIOS (usually under PC Health) or through software from the computer manufacturer, to monitor the computer’s temperature sensors. There is usually a minimum of two sensors on the computer main board. One is for the processor (CPU), and the other is an ambient sensor to measure the temperature inside the computer case. This ambient sensor can keep you informed of the internal temperature of your computer.

What temperature is too hot for hard drives? Manufacturers and models vary. You can view or download your hard drive’s specifications from the manufacturer’s Web site. The average normal operating temperature amongst the common brands seems to be around 41° F – 140° F (5° C – 60° C). Faster, high performance drives will have higher operating temperatures. Since there can be a varying range per drive, be sure to look up your specific model.

What happens when a hard drive overheats? Thermal stress and warpage can happen at the hard disk assembly level. All materials have limits to the amount of heat that can be sustained before warpage occurs. Mechanical engineers call this coefficients of thermal expansion (CTE) and the highest tolerances are part of the designing process. Quite literally the heat a hard drive produces or absorbs from its environment can warp the hard disk assembly, or chassis, which is an aluminum cast. Since the platters, motor, and head disk assembly all attach to the hard disk assembly, thermal expansion and warpage can cause alignment problems.

This alignment affects how the data is read from the platters and can severely affect the bearings of the spindle motor. Bearing seizure is dangerous to the platters because the tiny magnetic heads are "flying" above the platters due to the air flow caused by the speed of the platters. If the rotation speed is not maintained, the heads can crash into the platter surface.

Moving to the electronics side, overheating can cause broken connections or increased resistance within a connection. This can lead to mechanical malfunction and error rates that accelerate with sustained high temperatures. These errors affect hard drive operation and data integrity.

As a whole these errors will cause the hard drive to continue to re-position the heads by means of the VCM — which in turn produces more heat. Despite the heat, the hard drive will stay the course and continue to fulfill its reading and writing tasks. Older drives used to work themselves to death. Modern drives use SMART (Self Monitoring And Reporting Technology, special self-monitoring circuitry on the hard drive) technology to monitor themselves and take precautionary steps to avoid crashes.

Deviation from the recommended operating temperature has an impact on a hard drive’s reliability. A vendor study on this subject mentioned that a hard drive operating five degrees above the recommended operating temperature could increase the failure rate by 10 to 15 percent. This has led drive manufacturers to begin placing temperature sensors on the hard disk electronics to monitor the operating temperature of the unit.

So far we have discussed just the amount of heat a hard drive produces. In your desktop computer right now, there are other sources of heat. The PC case literally becomes a dry sauna for all of the components — and your hard drive is sitting in this heat.

 

Air Flow and Other Cooling Considerations

When considering the air flow and cooling details of your computer, first consider the ambient air temperature of where the computer is sitting. Where is your computer? Under the desk or tucked away in a cabinet? If you have a notebook computer, where will that be? Find out what the air temperature around the computer is. Although you can install an extra fan inside the case, it won’t decrease the temperature of the air that is coming into the case from its surrounding environment.

If the computer is placed in a cabinet or closet, the limited airflow will affect the computer; the heated air that is exhausted out of the back of the case will have no where to exit, going right back inside the computer.

Maintaining the best airflow requires that the computer be regularly checked for dust build up. Dust can gather in, and around the intake vents, and around the internal fans. Today’s high performance computers have a fan on the processor, the video card, and the main board may have smaller fans cooling the chipset. Regular inspection of your computer should include checking that all intakes are dust free and exhaust fans are spinning at their highest speeds. If just one fan fails, heat will quickly build up.

Optimal airflow inside of the computer case is achieved by bringing fresh air in through the front of the computer and exhausting it out the back of the case. Server manufacturers use the same principle for rack mounted servers. PC case manufacturers include side and top exhausting options for computer builders to install more fans. For notebook users, maintaining good airflow requires the use of a docking station. This raises the notebook computer slightly in the back so that airflow can circle the entire unit.

Other cooling solutions involve using specially designed coolers, either to cool components directly or introduce refrigerated air. Some cooling products utilize water as a cooling mechanism. There are water-cooling doors for server cabinets, and one company has designed a cooling system that sprays a fine mist of non-conductive, non-corrosive water on the surface of heated components.

Computers that reside in harsh environments such as industrial work areas where the ambient temperatures are high should have a thermal solution to keep all of the computer’s components operating within their individual temperature ranges. Conversely, areas where the temperature is lower than the operating temperature of the computer’s components can produce a critical environment.

For example, in the food industry and bio-medical fields, inventory computers may be located in extremely cold environments. Environments that are too cold also present challenges and must be managed. Environmental thermal management, whether hot or cold, will protect user’s data, extend hardware lifecycles, and reduce electrical power consumption.

barry-graphic.jpg

Recovery for Overheated Drives

Can the data on an overheated drive be recovered? In many cases, yes. (Although this article focuses on overheated drives, data can even be restored from fire-damaged drives.) The biggest risk to the hard drive is powering it up after an extreme heat situation. If you have a computer system overheat, the safest course of action is to get the hard drive to a professional data recovery company rather than attempting a recovery on your own. Why?

There have been reports of putting hard drives in freezers in an attempt to get the hard drive operational again. This is putting the data at risk. There are a couple of reasons why this is dangerous. One reason is that hard drives have a specific atmosphere inside the hard disk assembly and extreme temperature changes can induce moisture to the moving parts. Because the head fly-height is so close to the platters, any moisture on the platters or heads can increase the chances of a head crash.

A second reason for not putting an overheated hard drive in the freezer is that the overheating may have caused the hard disk assembly to become warped. When the hard drive is powered up after cooling, the moving parts may ‘wobble.’ This can cause a head crash.

Professional data recovery engineers that specialize in the electronics and mechanics of hard drives will analyze the damage thoroughly before powering up a drive. Using proprietary techniques and tools, skilled engineers minimize the risks of starting the hard drive after it has overheated.

During a disaster involving overheated equipment, be proactive about resources and recovery procedures. For instance, a client with a large datacenter experienced a thermal disaster this past summer.

This datacenter had more than two Petabytes of storage across 9,000 hard drives. During the summer heat, the power grids of the facility began to fail due to the energy demands of the community, causing the datacenter to experience power fluctuations.

The rolling power wasn’t enough to trigger the uninterruptible power supply (UPS) or generator systems, yet there was an undetected electronic effect on an air conditioning unit’s fans. When the power grid supply became unstable and failed, the UPS and generator systems kicked in. Unfortunately, the A/C fans could not handle the power change and failed.

The datacenter IT staff noticed that the temperatures were rising within the datacenter and began executing their disaster recovery plan, including a safe shut down of the servers. Yet with all of the servers and hard drives, the temperature rose at a faster rate than the heat reduction, resulting from shutting down the servers. The datacenter’s well-written disaster plan had contingencies for overheated hard disks, with one process outlining to call a data recovery provider.

The vendor of the storage equipment went through the remote logs of the servers and storage units before they were shut down and began checking each system carefully. Once the servers were safely shut down, assessments were made from a facility and equipment perspective.

With the A/C units repaired and temperature returning to normal levels, suspect drives were replaced and the data center was methodically brought back online. In this case, the datacenter did not require data recovery services; however, it was part of their disaster plan to get a professional data recovery company engaged early in the disaster recovery process.

The success and minimal liabilities around this datacenter’s thermal disaster shows the importance of having a well thought out disaster recovery plan. Successful disaster and business continuity planning require that all internal and external resources, including data recovery services, are utilized as early as possible.

Thermal management is an important consideration for all computer users. Whether you have computers at home or at work, or have one computer or hundreds of servers, understanding the heat tolerances for your equipment is vital. The benefits of protecting your computer and hard disk from temperature extremes are that your data will be preserved and the equipment life will be extended.

v

Sean Barry is the remote data recovery manager of North America for Ontrack Data Recovery.



"Appeared in DRJ's Spring 2007 Issue"
Login to post comments