Resolving an over temperature problem for a water-cooled 8335-GTW or 8335-GTX system
Learn how to identify the service action that is needed to resolve an over temperature problem.
Procedure
-
Go to Water cooling system specification and requirements. Are all
of the requirements for water-cooled systems met?
If Then Yes: Continue with the next step. No: Work with the customer to ensure that all of the requirements for water-cooled systems are met. This ends the procedure. -
Is the room temperature less than 40°C (104°F)?
If Then Yes: Continue with the next step. No: Notify the customer. The customer must bring the room temperature within normal range. Continue with the next step. - Do you have an 8335-GTX
system?
If Then Yes: You might be required to set the thermal mode of the system to a setting other than the default setting, depending on your system, adapter, and cable type. For details, see Determining and setting the thermal mode for an 8335-GTG, 8335-GTH, or 8335-GTX system. If the problem persists, continue with the next step. No: Continue with the next step. -
Ensure that the following requirements are met:
- The quick-connects between the 8335-GTW or 8335-GTX system and the water manifold are mated and connected to the proper circuits of the manifold. The supply hose must be connected to the supply manifold circuit, which is the manifold circuit that is located toward the inside of the rack. The return hose must be connected to the return manifold circuit, which is the manifold circuit that is located toward the outside of the rack.
-
The facility water supply hose is properly connected to the supply hose on the manifold and the
return hose on the manifold is properly connected to the facility water return hose.
- The ball valves that connect the facility water supply hose to the manifold supply hose and the facility water return hose to the manifold return hose are open. For more information about connecting the facility water hoses to the manifold hoses, see Replacing the water manifold in the 8335-GTW or 8335-GTX.
- All of the valves that might restrict the flow of water through the hoses are open in the facility water system.
- The pumping unit of the facility water system is on and does not have errors.
- The facility water system is supplying water at the required temperature and flow.
Does the problem persist?If Then Yes: Continue with the next step. Note: Steps 1 - 4 resolve most problems. Ensure that you carefully check steps 1 - 4 before you continue with the next step.No: This ends the procedure. -
Is a single memory DIMM, power supply, or voltage regulator over heating?
If Then Yes: Replace the over heating item. If your system is an 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX, go to 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX locations to identify the physical location and the removal and replacement procedure. This ends the procedure. No: Continue with the next step. -
Is a processor over heating, but the other processor and the graphics processing units (GPUs)
are not over heating?
If Then Yes: Check the thermal interface material (TIM) between the cold plate and the processor that is over heating. Go to Removing a system processor module from the 8335-GTW or 8335-GTX system and complete the steps to lift the cold plate off the processor. If the TIM pad is damaged, replace the TIM pad. To replace a TIM pad, go to Replacing a system processor module in an 8335-GTW or 8335-GTX system and complete the steps for removing and installing a new TIM pad. This ends the procedure. No: Continue with the next step. -
Is a GPU over heating, but the other GPUs and the processors are not over heating?
If Then Yes: Replace the thermal interface material (TIM) between the cold plate and the GPU that is over heating. Go to Removing the graphics processing unit from a water-cooled 8335-GTW or 8335-GTX system and complete the steps to lift the cold plate off the GPU. Then, go to Replacing the graphics processing unit in a water-cooled 8335-GTW or 8335-GTX system and complete the steps for installing a new TIM pad. If the problem is not resolved, replace the GPU. For instructions about replacing a GPU, see Removing and replacing a graphics processing unit in the 8335-GTW or 8335-GTX. This ends the procedure. No: Continue with the next step. -
Replace the cold plates. For instructions about how to replace the cold plates, see Removing and
replacing the cold plates in the 8335-GTW or 8335-GTX.
Does the problem persist?
If Then Yes: Go to Contacting IBM service and support. This ends the procedure. No: This ends the procedure.