Water is a core part of many data center cooling systems. But as densities - and therefore temperatures - increase, questions need to be asked about the right temperatures of the water cooling these systems.As the chips running servers become denser and more powerful, operators are faced with questions around whether to lower the temperature of the water going to these chips, to the point we will have to start focusing more on cooling the water systems.
水是许多数据中心冷却系统的核心组成部分。然而,随着芯片密度增加和温度升高,对于冷却这些系统的水温问题,需要进行重新审视。随着服务器中运行的芯片变得更加密集和强大,运营商面临着是否需要降低流向这些芯片的水温的问题,以至于我们可能需要更加关注水冷系统本身的冷却。
The move to liquid
Historically, data centers have been kept at around 20°C to 22°C, but groups such as the American Society of Heating, Refrigeration and Air-Conditioning Engineers (ASHRAE) have been advising organizations to set thermostats higher for years. As a result, data center temperatures have been creeping up: Facebook parent company Meta raised its temperatures to 29.4°C, Google went up to 26.6°C, and Microsoft has published guidelines suggesting temperatures could go up to 27°C.
历史上,数据中心通常保持在大约20°C到22°C之间,但像美国采暖、制冷与空调工程师学会(ASHRAE)这样的组织多年来一直建议各机构将温度设定得更高。因此,数据中心的温度逐渐升高:Facebook母公司Meta将温度提高到29.4°C,Google提高到26.6°C,微软发布的指南建议温度可以提高到27°C。
Typical legacy data centers have chilled water set points between 42-45°F (6-7°C). Facilities that have gone through optimization of their cooling systems have successfully raised their chilled water temperatures to 50°F (10°C) or higher. According to Johnson Controls, it is estimated that for every 1°C (1.8°F) increase in the temperature of chilled water, there is approximately 2-3 percent savings in power consumption for a typical chiller.
A recent DCD Broadcast analyzed a case study about how a service provider in the UK achieved a £1.5m ($1.9m) annual saving by increasing the temperature of the data hall, which only translated into a 0.3 percent increase in hardware failure risk.
传统的老式数据中心通常将冷却水的设定温度保持在42-45°F(6-7°C)之间。经过冷却系统优化的设施已经成功地将冷却水温度提高到50°F(10°C)或更高。据Johnson Controls公司估计,冷却水温度每升高1°C(1.8°F),典型冷却器的能耗大约可减少2-3%。最近的一次DCD文章分析了一项案例研究,展示了一家英国服务提供商通过提高数据大厅的温度实现了每年节省150万英镑(190万美元)的成本,这仅导致硬件故障风险增加了0.3%。
“Cooling has always been the second-largest consumer of energy in the data center after the IT load, and this is mostly energy used to cool whatever the heat transfer medium is - be it air or liquid. So the less energy is spent there, the better the overall efficiency of the facility,” says Vlad-Gabriel Anghel, director of solutions engineering at DCD’s training unit DCD>Academy.
“冷却一直是数据中心中仅次于IT负载的第二大能耗,这主要是用于冷却任何热传递介质——无论是空气还是液体。因此,在这一方面消耗的能量越少,整个设施的效率就越高,”DCD解决方案工程总监Vlad-Gabriel Anghel表示。
The picture is changing as the industry moves towards predominantly liquid-cooled data centers, where a liquid such as water circulates directly over the heat-producing components and removes heat. Water has a much higher thermal capacity than air, meaning data centers can support higher-density chips and use less energy to cool them
随着行业向主要使用液冷的数据中心转变,这一情况正在发生变化。在这种数据中心中,液体(如水)直接在产生热量的组件上循环并带走热量。水的热容量远高于空气,这意味着数据中心可以支持更高密度的芯片,并且使用更少的能量来冷却它们。
Though air-based cooling options exist for racks drawing more than 20kW, the drawbacks start to outweigh the benefits, leading operators to switch to liquid systems. For years, 30kW was seen as the top-end of high-density deployments, and air was good enough. With the advent of generative AI and what classes as ‘high-density’ now potentially reaching more than 100kW, sticking with air cooling alone is no longer an option.
虽然对于超过20kW功耗的机架仍然存在空气冷却选项,但其缺点开始超过优点,导致运营商转向液体冷却系统。多年来,30kW被视为高密度部署的上限,空气冷却已经足够。然而,随着生成式人工智能的出现,“高密度”的定义现在可能超过100kW,单靠空气冷却已不再是可行的选项。
The fluid running over liquid systems has a much higher temperature than that found in chilled water systems, but the industry is yet to standardize on the best approach. At the same time, chips are becoming increasingly dense, and the temperatures of the water being supplied to these systems is coming down.
液体冷却系统中使用的流体温度远高于传统冷却水系统中的水温,但行业尚未在最佳方法上达成标准化。同时,芯片密度不断增加,而供应给这些系统的水温也在降低。
Data center operators have long been accused of being over-cautious by overcooling their air-cooled data centers to protect the IT hardware and avoid even the merest risk of overheating the data halls. Showing too much trepidation on liquid cooling risks the same issue.Higher water temperatures mean less energy used for cooling – great for PUE – but risks running chips closer to their thermal limit. So, how hot is too hot?
长期以来,数据中心运营商因过度冷却其空气冷却的数据中心以保护IT硬件,避免数据大厅过热的风险而被指责为过于谨慎。在液体冷却方面表现出过多的忧虑可能导致同样的问题。较高的水温意味着冷却所需的能量更少——这对PUE(电力使用效率)来说是有利的——但也会使芯片更接近其热极限运行。那么,水温多高才算过高呢?
What is the right temperature for water?
ASHRAE introduced a paper on liquid cooling back in 2011. The paper set out broad classes - W1, W2, W3, W4, and W5, based on the cooling temperature. Originally those classes were 17°C, 27°C, 32°C, 45°C and Over 45°C, respectively. When the work was updated in 2022, new temperature refinements were required, including a temperature of 40°C, and ASHRAE moved to new class definitions: W17, W27, W32, W40, W45, and W+.
ASHRAE在2011年推出了一篇关于液体冷却的论文,该论文根据冷却温度设定了广泛的分类——W1、W2、W3、W4和W5,最初这些分类分别对应17°C、27°C、32°C、45°C和45°C以上。当该研究在2022年更新时,需要新的温度细化,包括40°C的温度,ASHRAE因此改为新的分类定义:W17、W27、W32、W40、W45和W+。
DCD>Academy’s Anghel says there is no optimal temperature for water in liquid-cooled systems, because the best temperature will vary depending on the set-up of the facility.“This will depend entirely on the type of liquid cooling used as well as the environment the liquid cooling system is in, the type of chip and its TDP as well as the utilization of the chip,” he says. “A rear-door air-assisted liquid cooling solution will have different temperatures to a closed loop direct to chip cooling system.”
DCD Academy的Anghel表示,液体冷却系统中没有一个最佳的水温,因为最佳温度会因设施的不同而变化。“这完全取决于所使用的液体冷却类型、液体冷却系统所在的环境、芯片类型及其TDP(热设计功耗),以及芯片的使用情况,”他说。“后门空气辅助液体冷却解决方案的温度将与直接芯片的封闭循环冷却系统的温度不同。”
According to Uptime, water temperatures in liquid-cooled systems today seem to be converging around 32°C (89.6°F) for facility water – what is described as a “good balance” between facility efficiency, cooling capacity, and support for a wide range of DLC systems. The company notes, however, that this often requires additional heat rejection infrastructure either in the form of water evaporation or mechanical cooling for higher-density chips.
根据Uptime的说法,今天液体冷却系统中的水温似乎正在趋向于32°C(89.6°F),这被描述为在设施效率、冷却能力和支持多种DLC系统之间的“良好平衡”。然而,该公司指出,这通常需要额外的热排放基础设施,例如水蒸发或机械冷却,以适应更高密度的芯片。
“Many operators have already opted for conservative water temperatures as they upgrade their facilities to incorporate a blend of air and liquid cooled IT. Others will install DLC systems that are not connected to a water supply but are air-cooled using fans and large radiators,” the company said in a recent report.The analyst firm notes current high-end processors (up to 350W thermal design power) and accelerators (up to 700W on some GPUs) can be “effectively” cooled even at high liquid coolant temperatures, allowing the facility water supply for the Direct Liquid Cooling system to be running as high as 104°F (40°C), and even up to 113°F (45°C).
“许多运营商已经选择了保守的水温,因为他们在升级设施时会结合空气和液体冷却的IT设备。其他人会安装不连接到水源的DLC(直接液冷)系统,而是使用风扇和大型散热器进行空气冷却,”该公司在最近的一份报告中表示。分析公司指出,即使在液体冷却剂温度较高的情况下,目前的高端处理器(最高达350W的热设计功耗)和加速器(某些GPU最高达700W)也可以“有效地”冷却,这使得直接液冷系统的设施供水温度可高达104°F(40°C),甚至可以达到113°F(45°C)。
Andrew Bradner, general manager for Schneider Electric’s cooling business, tells DCD, however, that after chips reach 500W, supply water temperatures have to come down to 85°F (30°C). And for 700W, the temperature may have to come down to as low as 80°F (27°C).“This idea that you're going to run liquid cooling at 122-140°F (50-60°C) water is probably going to be highly unlikely, especially in the training loads,” Bradner says.And in the same way air-cooled data centers have generally been run colder out of caution, customers using liquid cooling deployments are being equally prudent.As part of its AI-focused redesign, Meta has settled on 85°F (30°C) for the water it supplies to the hardware, and hopes to get the temperature more widely adopted through the Open Compute Project.
然而,施耐德电气冷却业务总经理Andrew Bradner告诉DCD,当芯片达到500W时,供水温度必须降到85°F(30°C)。而对于700W的芯片,温度可能需要降到80°F(27°C)。“在122-140°F(50-60°C)水温下运行液体冷却的想法可能极不可能,尤其是在训练负载的情况下,”Bradner说。就像空气冷却数据中心出于谨慎通常运行在较低温度一样,使用液体冷却部署的客户也同样谨慎。在其专注于AI的重新设计中,Meta将供应给硬件的水温定为85°F(30°C),并希望通过开放计算项目(Open Compute Project)更广泛地采用这一温度。
Anecdotally, however, DCD has heard operators who expected customers to go with the ASHRAE definition of W27 (27°C/80°F output water) and are instead opting for the W17 (17°C/62°F) option.“There's a lot of discussion at times that the water temperatures are going to go to 104-122°F (40-50°C),” says Bradner. “But as the power densities of the GPUs start to get over 500W to 700W each, the case temperatures that they're starting to see are requiring that that water comes down lower.”“Once you've hit 700W, the water temperature has to come down to about 80°F (27°C). And we have customers that are asking for between 68-75°F (20- 24°C) supply water.”
然而,据DCD的非正式消息,运营商原本期望客户采用ASHRAE定义的W27(27°C/80°F输出水温),但实际上他们选择了W17(17°C/62°F)选项。“有时关于水温将达到104-122°F(40-50°C)有很多讨论,”Bradner说。“但是随着GPU的功率密度开始超过每个500W到700W,他们开始看到的壳温要求水温降低。”“一旦达到700W,水温必须降到大约80°F(27°C)。我们有客户要求供应水温在68-75°F(20-24°C)之间。”
Free cooling vs assisted cooling
When you need 68-75°F (20- 24°C) temperature water, some assisted cooling technologies are required in many cases – especially in hotter climates.As an example, Bradner said Schneider recently performed an assessment with a partner around free-cooling – which relies on pulling in naturally cool air or water instead of mechanical refrigeration – at higher densities.As long as the chip densities were at 300W, 95 percent of their sites could get away with not having any type of mechanical assist to provide the water temperatures they needed to run a liquid system.However, once chips went over 500W, only five percent of their sites could support free-cooling, and 95 percent of their sites needed some sort of compressor mechanical-assisted solution.
当需要68-75°F(20-24°C)的水温时,在许多情况下需要一些辅助冷却技术,尤其是在气候较热的地区。例如,Bradner提到,施耐德电气最近与一个合作伙伴进行了一项关于在高密度情况下使用免费冷却(利用自然冷空气或冷水,而不是机械制冷)的评估。只要芯片密度在300W以内,他们95%的场所不需要任何机械辅助来提供运行液冷系统所需的水温。然而,一旦芯片功率超过500W,他们只有5%的场所可以支持免费冷却,而95%的场所需要某种压缩机机械辅助解决方案。
“So I think that's the challenge,” says Bradner; “As the chips get more powerful and more power hungry, the internal dissipation that needs to happen to the chip case housing requires colder water to be able to still support reliable cooling of those chips.”
“我认为这就是挑战所在,”Bradner说。“随着芯片变得更强大和耗电量更大,需要对芯片壳体内部进行散热,而这需要更低温度的水来支持这些芯片的可靠冷却。”
But he notes that for water temperatures around 80-86°F (27-30°C), there are still large parts of the year that you're going to get free cooling, and operators may only need assisted cooling for the hottest summer months.
“Right now, you can run the 300W-400W chips that are available with far higher water temperatures," Bradner says. “But that's going to change dramatically once these more powerful GPUs become readily available and deployed at scale. We're seeing many of our largest customers are talking about water temperatures that are more in the 80-86°F (27-30°C) range, not 104-122°F (40-50°C).”
不过他指出,对于80-86°F(27-30°C)的水温来说,全年仍有很大一部分时间可以使用免费冷却,运营商可能只需在最热的夏季使用辅助冷却。“目前,你可以用更高的水温运行300W-400W的芯片,”Bradner说。“但随着这些更强大的GPU变得更加普及和大规模部署,这种情况将会发生显著变化。我们看到许多大型客户正在谈论的水温更多是在80-86°F(27-30°C)的范围内,而不是104-122°F(40-50°C)。”
DCD>Academy’s Anghel warns that if water temperatures are set too low, operators risk overcooling chips and ultimately wasting energy and repeating mistakes long made with air cooling.“Any watt spent cooling water is another watt removed from the IT load,” he says. “The same efficiency mistakes are being made regardless of the cooling medium.”
DCD>Academy的Anghel警告称,如果水温设定得太低,运营商可能会过度冷却芯片,从而浪费能源,并重复过去在空气冷却中犯下的错误。“任何用于冷却水的能量都是从IT负载中移除的另一份能量,”他说。“无论冷却介质是什么,同样的效率错误都在发生。”
来源:未知
①凡本网注明"信息来源:热传商务网"的所有文章,版权均属于本网,未经本网授权不得转载、摘编或利用其它方式使用。
②来源第三方的信息,本网发布的目的在于分享交流,不做商业用途,亦不保证或承诺内容真实性等。如有侵权,请及时联系本网删除。联系方式:7391142@qq.com