Research-Based Application of Composable GPU Solution to Multiple AI Training Usage

January 07, 2023

801

BACKGROUND

Along with the mega-trends of cloud computing proliferation, AI growth, and Network cloudification, a broader range of emerging workloads exerts pressure on data centers to expand their infrastructure scale. Based on internal requirements and research motivation, a widely known research institute sought a solution to overcome future possible infrastructure restrictions. They wanted to design a flexible and extensible fundamental architecture that could improve production efficiency without tearing down or building new data centers.

This research team found that different hardware configuration affects the performance of applications. Therefore, they turned to the elasticity of the composable system to cater to the application requirements in different AI workload environments. According to their hypothetic concept, this design has to support multiple tasks by drawing resources from a pool with processors, accelerators, storage, and network devices. They went online to search for possible partners to complete this project, and they found H3. Since 2014, H3 has already worked on PCIe switch technology for years. Therefore, this well-known institute reached H3.

SOLUTION

Traditionally, a server is used to pair up with fixed accelerators and memory. No matter how much the workload is, the server uses the same package to process. Such an operation leads to over or under-provisioning. H3’s PCIe switch technology realizes the avant-garde concept of pooling all the resources in individual chassis at a rack level. Disaggregated devices can be dynamically distributed to upstream compute servers through PCIe interconnects, effectively tackling the resource waste problem.

Standing on the solid experiences of PCIe switch technology development, H3 had a breakthrough in connecting Broadcom PEX9797 switches to maximize the number of lanes (288 PCIe 3.0 lanes) for device and host linkage. Such an infrastructure design outperforms the traditional one by non-blocking sharing independent accelerators, storage, and communications at a rack level. Users can arrange the needed lanes optimally according to application bandwidth requirements. For example, the research team found that in a specific fixed configuration environment, due to the interconnect bandwidth saturation, the GPU performance faces the bottleneck as the bandwidth saturates by its maximum of 16 GB/s (Figure 1&2).

In other words, no matter how many GPUs got installed, that hardware system can only offer the performance of about four GPUs. Thanks to PCIe switch technology, the bandwidth can earn the opportunity to be raised or reduced with dynamic lane arrangement instead of being locked. The number of GPUs turns meaningful. As well, the resources become composable and arrangeable according to the tasks. For example, Figures 3 and 4 show system can allocate different sets of accelerators to every server for their specific workload, widely enhancing agility and flexibility.

RESULTS

According to the research report, this research confirms that a composable system can effectively buffer the influence of different component configurations on the overall performance. One excerpt from their research paper that a researcher can conduct AI workloads efficiently with the flexibility of the composable system is a firm acknowledgment of our profession and hard work. Not only having produced the valuable research paper, but this research team also adopted composable solutions in their labs to run different AI workloads since that time. This research team is still our customer, having purchased two generations of PCIe composable systems and management solutions.

category : Composable Infrastructure

tags :

Research-Based Application of Composable GPU Solution to Multiple AI Training Usage

﻿BACKGROUND

BACKGROUND