The potential of high-performance computing (HPC) is incredible. But all that processing power comes with a high energy demand, which has both a financial and environmental cost. For those housing HPC, the challenge is twofold: to meet and manage this energy requirement, and to do so as efficiently as possible.
A regular rack in a data centre may have a power requirement of 3 – 7 kW. An HPC rack could be as much as 40 kW. That’s because HPC links a number of nodes (made up of CPUs and GPUs) within the same size footprint as a standard rack: aka high-density computing. It takes a lot of energy to power all these cores, and they generate a lot of heat, which needs cooling to avoid damage to sensitive electrical components. Efficiency is therefore a key challenge and priority of HPC.
1. Utilisation of hardware
In this case, efficiency comes down to designing a cluster that fits your workload. Idle nodes still require power and cooling – and while they sit unused you are not getting any return from them.
Squeezing as many nodes as possible into a rack can be a false economy. Apart from the energy cost of unused capacity, there is also a performance cost in the reduced memory bandwidth per core.
For optimum efficiency, your system should be designed to meet – not exceed – your requirements. Where demand fluctuates wildly it is worth investing in software that can power down idle nodes to increase energy efficiency.
2. Cooling technologies
HPC requires specialised cooling technologies beyond the needs of general computing (where the temperature can typically be controlled by good ventilation and purpose-built air conditioning). With a high number of CPU and GPU cores in an HPC rack, the power demand is significantly increased and so is the heat load. Generally, air cooling can’t handle the heat load of HPC – and certainly not efficiently.
Most on-site server rooms can't cope with HPC's high demands, which is why it's most convenient for companies to find a data centre that has been especially designed to handle the heat loads of HPC. Cooling can be as much as 50% of the power costs of a data centre, so energy-efficiency is very important to cost as well as environmental impact. Currently there are several different options for HPC cooling:
- Rear door heat exchanger – Chilled water is fed to a metal coil (heat exchanger), which is set on the back of the rack. The heat from the servers is transferred to the water and cold air is ejected from the heat exchanger. This is like a radiator but working in the opposite way. This is suited to all rackmount hardware and server/ HPC suppliers.
- Liquid cooling – Water is piped through to a heatsink that’s situated on the hot components such as the CPU or GPU enabling highly efficient heat transfer. This requires bespoke hardware and custom connections within the rack environment.
- Immersion cooling –Hardware is immersed in dielectric coolants (i.e. non-conductive liquid). Heat from the hardware is transferred to the liquid, creating circulation, which keeps the components at the right operating temperature. This is highly custom, and doesn’t fit traditional rackmount equipment very well at the present time, but is an area which is being developed by vendors such as Icetope.
At 4D, our rear-door cooling is both highly energy-efficient and doesn’t affect your ability to access your hardware for maintenance. These are specially made for us by USystems and this is part of what has enabled us to achieve a PUE (power usage effectiveness) score of just 1.14.
3. Constantly monitoring the situation
Ultimately, achieving optimum efficiency is about getting specific. Not just making blanket plans for the entire data floor, but also identifying opportunities to create power savings and limit potential power loads.
Monitoring of heat distribution
Any decent data centre will use heatmaps to assess hot spots, and plan and forecast power usage while increasing rack density. Targeted cooling saves power and enables us to recognise and eliminate fire risks.
A study by Intel found by redistributing workloads to CPUs with higher energy efficiency, they were able to make substantial power savings – in some cases as much as 8%. A great example of how temperature control can reduce energy consumption.
The PUE measurement is the ratio of the total power used by the data centre compared to the power delivered to the computing equipment. It includes lights, cooling, pumps, and HVAC. As I mentioned, ours is just 1.14, much lower (better) than industry average.
For a data centre to lower their PUE they have to monitor the energy use of everything in their data centre, and optimise it, so it’s a good indicator of how dedicated to energy-efficiency they are. And since energy use has a direct connection to emission output, it’s also a good benchmark to see how green their technology is.
The next steps for energy-efficient HPC
When it comes to HPC, you can drive efficiency by building the right sized solution for your needs and housing it in a purpose-built facility.
Not all data centres are designed with HPC in mind and adding it on as an afterthought can lead to ineffective solutions with poor efficiency. When you are choosing between data centres, exploring their chosen cooling methods is a good starting point, as well as finding out their PUE.
4D designed our Gatwick site with HPC in mind, and our cooling towers are award-winningly energy efficient, so if you’re looking at AI programmes, 3D modelling, or anything else that requires high-density computing, get in touch to explore what we can offer you.