A Guide to Data Center Cooling Systems
Modern IT demands, such as big data processing or machine learning workloads, have increased the workload on data centers in recent years. This is so, even before you take into account the rapid upward trend in artificial intelligence (AI) usage. Consequently, data center cooling systems have become an even more critical component in data center design, since the newest systems generate more heat than previous generations of systems.
This guide covers all aspects of data center cooling, starting from traditional air handling and moving to advanced liquid immersion cooling. Discover more about airflow management, today’s advanced liquid cooling solutions, and the importance of infrastructure design in modern data center management.
Data Center Heat Management Principles
Effective cooling system designs begin with an understanding of heat transfer. The three heat transfer mechanisms include conduction, convection, and radiation. Heat conduction occurs when thermal energy within a component is transferred through the solid material, such as from a processor to a heat spreader or heat sink. Convection moves the heat to the surrounding air or liquid, while radiation is a very small but measurable factor.
Airflow is the most important factor in all cooling methods. Regardless of whether liquid loops support the system or not, good vertical and horizontal thermal management starts with air or fluid flow over the right surfaces and efficient extraction of the waste heat.
Traditional Methods of Cooling
Many organizations still rely on older methods, particularly those with moderate workloads and established systems.
Computer Room Air Conditioning (CRAC)
CRAC units were the foundation of early data center cooling. They draw in the warm return air, cool it with direct expansion systems, and then redistribute the chilled air back to the room. The approach is simple and efficient for lower density deployments.
The drawback is inefficiency at higher loads. As rack power density climbs beyond 5–10 kW, CRAC units often struggle to maintain even temperature setpoints across the room without overcooling the room or driving up cooling energy expenditure.
Computer Room Air Handling Systems
By using chilled water from external chillers, CRAH systems decouple cooling from mechanical refrigeration, resulting in greater efficiency for medium to large facilities. CRAHs integrate into the building chilled water loops, thus furthering precise control of the facility environment.
These systems are limited by the investment cost of facility systems integration, infrastructure coordination, and the climate of the region. Distribution of load, climate, and airflow containment strategies also affect the CRAH’s effectiveness.
Airflow Management Strategies
The right cooling optimization strategies depend to a large extent on the effectiveness of airflow management. Operators are better positioned to control efficiency and pinpoint hot spots using streamlined airflow.
Containment: Hot Aisle and Cold Aisle
By eliminating the mixing of hot and cold airflow streams, containment enhances predictability. Operators can control cooling better by directing the cold supply to the server inlets and containing the exhaust in a hot aisle.
Cold aisle containment is easier to retrofit, while hot aisle containment provides stronger thermal separation but requires more structural changes. Both strategies enable higher rack density without a proportional increase in cooling energy costs, improving the economics of rack density and cooling energy cost.
Raised Floor vs. Overhead Distribution
Raised floors with perforated tiles were the original standard for chilled air distribution. Overhead systems were developed later as an alternative. While still employed, raised floors face the challenge of air leakage and the uneven distribution of air.
Overhead cooling, with ducts or diffusers, minimizes leakage and streamlines cabling routes. The decision to use one or the other is dictated by legacy limitations and the structural layout of the facility, along with airflow considerations. As is most common in the industry, operators are now moving towards hybrid models with additional supplemental containment systems.
Liquid Cooling: the Next Step in Density Management
As the density of racks continues to increase, air-based cooling systems begin to plateau. Liquid cooling is far more effective as it directly extracts heat and even allows for far greater efficiency and density levels.
Direct-to-Chip Liquid Cooling
Direct-to-chip liquid cooling (DLC) systems involve the use of cold plates mounted on the processor and other electronic chips. The heat is absorbed by the circulating coolant, which is then carried away through the closed loop. Thus, rack densities of even well over 30 to 40 kW can now be achieved and are significantly greater than those which can be achieved through air. The current generation of racks requires more than 100 kW of heat removal, making DLC an increasingly important solution for modern data centers.
Immersion Cooling
Immersion cooling involves submerging servers in dielectric fluid. In the case of single-phase systems, this fluid is circulated through external heat exchangers. In a two-phase immersion cooling scenario, even greater efficiencies can be obtained.
Useful for high-density workloads, such as high performance computing (HPC) and AI clusters, immersion cooling has some notable upsides. For example, it decreases noise, eliminates server fans, and allows for dense packing. Despite this, the handling of specialized fluids, servicing equipment, and the required industrial knowledge pose certain obstacles for some IT professionals.
Hybrid Cooling Approaches
Not every facility is ready to choose a full liquid cooling solution. Operators can blend air and liquid approaches with respect to workloads and budget under hybrid strategies.
For instance, cooling for CPU workloads can be accomplished with direct liquid plates while air cooling is maintained for memory and storage. Some facilities use rear door heat exchangers with aisle containment. These approaches are very common in AI deployments where GPU racks are cooled with liquid loops and standard air-cooled racks.
Energy Efficiency and Sustainability in Cooling
In common with other aspects of IT infrastructure, cooling systems are also assessed for their energy efficiency and environmental impact. Operators have to maintain the right balance in performance, efficiency, and sustainability.
Power Usage Effectiveness (PUE) and Cooling
PUE is critical with respect to data center cooling systems. For one thing, reducing the cooling energy requirement helps to bear down on operational costs. Various methods may be deployed for strong PUE management, such as containment, liquid cooling, and reducing wasted airflow. Dealing with overprovisioning, particularly of legacy cooling systems, can also lead to environmental gains with lower overall energy consumption. With advanced approaches such as direct-to-chip liquid cooling and immersion, PUE values approaching 1.0 are now achievable.
Free Cooling and Economizers
Economizers make use of outside air or water. The idea is to benefit from a more favorable temperature than that which is in the air-conditioned space to supplement mechanical cooling methods. Airside economizers filter and condition external air. On the other hand, waterside economizers make use of cooling towers or dry coolers to reject heat.
Water Usage and Its Impact on the Environment
Water use is becoming an important metric. Traditional evaporative cooling has a high cooling water requirement. There is a shift to air-cooled chillers, and hybrid cooling towers, or more advanced water treatment, to reduce Water Usage Effectiveness (WUE).
Along with PUE, WUE is often part of sustainability reports. Data center managers need to take into account the balance between water conservation and energy use, especially in parts of the world where water resources are minimal.
Adiabatic and Geothermal Cooling
There is a trend to more sustainable approaches beyond ordinary air and liquid. These include adiabatic cooling and geothermal loops that use natural means to improve efficiency.
Adiabatic (Indirect Evaporative) Cooling
Indirect evaporative cooling, or adiabatic cooling, involves the reduction of air or water temperatures via controlled evaporation without the addition of water vapor to the space. This cooling technique enhances efficiency in arid regions by reducing the need for compressors.
Geothermal Cooling
Geothermal cooling utilizes the constant temperature conditions below the surface. It is facilitated by buried piping loops which transfer heat to or from the ground, reducing energy requirements for chillers. Although geothermal systems are still limited by location, they demonstrate the promise of renewable cooling systems.
Plumbing and Infrastructure Considerations
The performance of chillers and air handlers is affected by more factors than just the equipment. Piping or plumbing, and more generally, the piping layout, are critical and could pose limitations, especially with the rise of liquid loops.
Pipe insulation, materials, configuration, and layout in a system impact both reliability and efficiency. Additionally, leaks pose a significant risk and thus, robust and durable valves, fittings, and monitoring systems are critical. The management of heat and moisture exchange, or condensate management, is also critical. Effective drainage is required to avoid moisture buildup which could damage equipment.
In many modern facilities, plumbing design is becoming more prevalent, establishing a parity with electrical design. Also, plumbing design is now viewed as mission-critical, with requirements such as redundant loops and precision routing.
Advancements in Cooling Technology
The evolution of cooling technology continues to progress. Heat is extracted from rear doors of racks, eliminating heat from server exhaust streams and reducing the overall room workload. Edge facilities or colocation suites can be rapidly deployed with modular cooling units. Real-time sensor data is analyzed with AI-driven control systems.
Operators can more easily adopt immersion or direct-to-chip cooling with many server and rack designs now “liquid-ready.” These advancements illustrate a trend towards flexible, more sustainable, and denser deployments.
Cooling AI and High-Performance Workloads
AI and HPC clusters extend thermal boundaries, pushing racks to draw from 50 to over 100 kW. GPU array density increases the need for liquid cooling, often direct-to-chip or immersion, using redundant fluid distribution units.
The trend will be accelerated by exascale computing, which will need rack and facility-level integrated thermal management. High variability is expected, since training workloads can cause sharp bursts in heat output. With advanced cooling approaches, including DLC or immersion, CPUs are often able to sustain “boost” speeds for longer, further improving performance under demanding conditions.
Monitoring and Management
Active management is essential for thermal performance, these days. In new data center designs or when refresh projects are planned, it is now common for thousands of temperature, humidity, and pressure sensors to be deployed, each feeding metrics into central data center infrastructure management (DCIM) for real-time monitoring. Predictive maintenance and rapid anomaly detection become possible with such highly integrated monitoring systems.
What’s more, AI-based analytics are now implemented to predict high-traffic areas, optimize airflow, and even balance workloads within racks, all improving efficiency and reducing risk at the same time.
The Interrelated Matter of Scalability and Cooling Demands
Computational needs and cooling requirements need to work in tandem. Some cooling systems, for example, modular chiller plants and liquid distribution manifolds, are easier to scale because of their incremental nature.
Resilience is equally pivotal. Redundant pumps, chillers, and pipe loops maintain cooling continuity for varying degrees of component failure. Designs using N+1 or 2N redundancy are commonplace in mission-critical systems.
Edge Data Center Cooling
Edge facilities are often located in rugged environments or in space-constrained sites. In such locations, modular and compact systems maintain efficiency without requiring large chillers.
With high requirements for air handling units, liquid cooling is far more favorable at the edge. Ruggedized solutions, including sealed immersion tanks, enable usage in remote or harsh climates.
The Economics of Cooling
Both the capital expenditure and operating expenditure of a business are of equal importance when deciding on cooling systems. In some settings, air-based data center cooling systems are chosen simply because they’re considered to be the cheapest option to implement. This doesn’t take into account that such installations are often costlier to maintain, however, especially at scale.
By comparison, liquid systems are often far cheaper to maintain. This simple fact justifies the outlay needed to install or retrofit them, thanks to the lower energy costs and greater rack density they provide. Even more tellingly, some of the most advanced liquid cooling systems available today will not only reduce energy consumption but delay the need for facility expansion as existing equipment continues to cope with operational demands.
How to Plan a Cooling Strategy
Every facility has distinct and unique requirements. Operators need to analyze the present workloads, density growth, sustainability goals, and long-term business objectives. For rack density, containment may require complete redesign with liquid loops or immersion tanks.
The strategy should also support operational excellence objectives, be it reduced energy spend, sustainability goals, enabling new capabilities, or investing in upcoming AI clusters.
Conclusion
The evolution of data center server cooling has emerged from the use of basic CRAC units to the more advanced liquid immersion and/or geothermal systems. Strategy and airflow management should be incorporated with other innovative technologies and new sustainable design principles.
With liquid cooling, AI-driven optimization, and the continuously growing rack density, the future of thermal management is changing. For the operators, the success of the company depends on the planned cooling infrastructure, which should be a cornerstone and not an afterthought of low cost, high reliability, high efficiency, and highly scalable infrastructure.
To learn more visit, Supermicro Liquid Cooling Solutions.
Subscribe to Data Center Stories
By clicking subscribe, you consent to allow Supermicro to store and process the personal information submitted above to provide you the content requested.
You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.