Part of the Technology photoes in this website are created by rawpixel.com - www.freepik.com

Balanced GPU Solution in Composability and Performance

5999

AI productivity and GPU utilization rate are crucial

Even for enterprises, high-performance GPUs and AI experts are considered very expensive resources. Take Amazon as an example, one single unit of nVidia V100S costs the company $7,999 and the annual salary of an AI expert is $120K on average. Obviously, any AI or HPC company does not hire only one AI expert or use a single GPU to accelerate their computing process, thus how to optimize the use of these pricy GPUs to further increase the AI experts' productivity is an inevitable challenge for many enterprises.

HW composable capability to provide the required GPU on demand

There are two approaches to enhance the GPU utilization. It can be done either with software (SW) solutions such as GPU virtualization, or hardware (HW) solutions, which we are going to explore in this blog.

By HW approach, we mean composable GPU solutions, or so called "GPU pooling". GPU pooling is commonly achieved using InfiniBand or Ethernet, but Ethernet is not that ideal for scale-out and high-performance instances as the latency limits the GPU scale-out. On the other hand, the InfiniBand is widely used to achieve GPU pooling in cloud computing, where the GPUs can be accessed by the hosts with RDMA technology. The InfiniBand GPU pooling architecture appears to be more reliable, but also very expensive in equipment and operations.

a.      Big-scale GPU deployment by using InfiniBand or Ethernet.


 

Aside from the Ethernet and InfiniBand, there is another cost-efficient way to achieve GPU pooling with PCIe fabrics. Every CPU and GPU have built-in PCIe fabrics, and the PCIe channel ensures lowest latency while keeping the cost of deployment low. But you might be curious about how exactly do we make composable GPU possible through PCIe fabrics? Let’s look at the diagram below. 

b.      4-server deployment by using PCIe Gen 4 fabric.

 

This diagram demonstrates how GPUs in the chassis can be dynamically provisioned to the connected hosts, one can easily assigne proper amount of GPU resources to meet the workload (e.g. AI training), and return them to the pool, making the resources available for new tasks upon work done. One thing to highlight here is that through PCIe protocol, the provisioned GPU units are treated as direct-attached devices to the host servers, therefore the interference in data transfer process due to channel is minimized.

An easy-to-use self-service tool is the key to composable GPU solutions

AI experts use GPUs for many different functions in different stages of developments, and the computing power, or the number of GPUs, required in different stages vary. For example, a single GPU may be enough in test-run while 4 or more GPUs are required in AI training, there are even stages, such as result investigation, where GPU is unnecessary. Due to these dynamic and complex GPU usage patterns, an easy-to-compose and self-service GPU pooling structure is a must for AI experts to efficiently utilize their limited GPU resources. Luckily, H3 Platform provides composable GPU chassis and PCIe device management UI integrated solutions for AI experts to overcome the difficulties in computing resource allocation.

Cluster configuration of PCIe Gen4 GPU solution

 

This diagram demonstrates an average GPU configuration: the 4 hosts share 8 GPUs in the cluster. These GPUs can be provisioned to any connected host or returned to the GPU pool. At maximum, a host can get up to 8 GPUs when performing AI training.

Configuration:

- 4x host servers

- 2x GPU chassis and each chassis is with 4x GPUs

- Each GPU chassis provides 2 PCIe Gen4 x8 PCIe lanes.

In this example, PCIe Gen4 x8 is used to connect the hosts and GPU chassis by 2-meter MiniSAS cables

 

 

This diagram demonstrates a GPU configuration for higher performance: 2 hosts share 8x GPU in the cluster, PCIe Gen4 x16. 8 GPUs can be provisioned to any connected host or return to GPU pool. So, each host can get up to 8x GPUs when doing AI training.

Configuration:

- 2x host servers

- 2x GPU chassis and each chassis is with 4x GPUs

- Each GPU chassis provides 2x PCIe Gen4 x16 PCIe lanes.

In this example, PCIe Gen4 x16 is used to connect the hosts and GPU chassis by 2-meter MiniSAS cables 

How to allocate GPUs to hosts and return them back to the GPU pool

H3 Platform provides a user-friendly management UI to control all the GPU allocation activities. Users can easily configure and manage the GPU pool via a 1GbE management port.

a.      H3 management center UI- GPU provision

 

b.      H3 management center UI- Resource performance

 

How to manage multi-GPU chassis in your network

Multiple GPU chassis may be needed for reasons such as more hosts are needed, more GPUs are needed, or special setups to meet requirements of a unique project. Fortunately, H3 managment center is capable of managing multi-units, supporting all the unique set-ups to meet the needs of different user groups.

Multi-chassis management on H3 management center

 

Building your own AI cloud by the newest PCIe Gen4 solution

Imagine a group of AI experts, looking to build their own AI environment with limited budgets on infrastructures. It would be impractical for them to purchase two separate high-performance servers where costs are high and GPU utilization cannot be optimized. On the other hand, the architecture of servers plus GPU pooling solution can significantly reduce the cost while increasing their productivity at the same time.


category : GPU
tags :
修正document.querySelector('link[rel="canonical"]').href = url_now; setCanonical('https://www.h3platform.com/blog-detail/' + reserved_para); } if (blogNum == "0") { if (para_id == "26") { setTD("NVMe MR-IOV - Lower TCO of IT System|H3 Platform", " Falcon 5208 NVMe MR-IOV solution ensures SSD performance and flexibility,. With built-in PCIe fabric, it requires less hardware to achieve high-performance storage service in comparison to other NVMe-oF solutions. An MR-IOV solution also allows better utilization of expensive CPUs especially in virtual environments."); } else if (para_id == "29") { setTD("【CXL Storage】 CXL 2.0 / PCIe Gen 5 - The Future of Composable Infrastructure|H3 Platform", "H3 Platform has NVMe MR-IOV solution, increasing storage utilization. SR-IOV of the NVMe SSDs is enabled in the NVMe chassis. CXL device are general-purpose accelerators such as NIC and GPU. CXL specification is based on PCIe Gen 5, and CXL allows CPU to access shared memory on accelerator devices. Nowadays, CXL 2.0 introduces pooling capability to the CXL protocol, improving the composability of memory."); } else if (para_id == "30") { setTD("【PCIe Expansion Chassis】– Big Accelerator Memory-Enhancing GPU and Storage Efficiency with PCIe Expansion Solution|H3 Platform", "Nvidia recently released a report on the effectiveness of Big Accelerator Memory (BaM) architecture. BaM leverages GPUDirect RDMA, allowing GPU thread to communicate with SSDs using NVMe queues to ultimately reduce reliance on CPU."); } else if (para_id == "36") { setTD("【CXL memory expansion】– Memory Expansion for Breakthrough Performance|H3 Platform", "CXL memory have been widely discussed for its capability to enhance memory bandwidth and capacity, and these benefits are significant to the emerging AI/ML applications. "); } else if (para_id == "40") { setTD("Toward PCIe Gen 5 Composable Infrastructure as a Service|H3 Platform", "The two case examples above indicate H3's capability to realize device pooling potential and expand resource configuration flexibility. That might be why SC 22 invites H3 to share experiences in the panel session. H3 is ready for everything @SC22. We look forward to displaying H3's avant-garde PCIe Gen 5 CIaaS worldwide."); } else { setTD(strT); } setInternalLink(document.querySelector("div.editor-content"), { href: "/product-list/10", anchor: array_gpuchassis[urlID % array_gpuchassis.length] }, { href: "/product", anchor: array_product[urlID % array_product.length] }); setArticleSchema(); document.querySelectorAll("ul.breadcrumb a")[1].href = "https://www.h3platform.com/blog-list?category=10"; document.querySelectorAll("ul.breadcrumb a")[2].innerHTML = document.querySelector(".title-container h1").innerText; document.querySelectorAll("ul.breadcrumb a")[2].href = url_now; document.querySelectorAll("ul.breadcrumb a")[2].style.color = "#808285"; } else if (blogNum == "1") { if (para_id == "24") { setTD("Increase the Efficiency of Storage System with Multi-host NVMe SR-IOV solution|H3 Platform", "NVMe SR-IOV is the solution for NVMe SSD sharing the resource among multiple servers often limits SSD’s performance as the networking creates I/O bottleneck."); } else if (para_id == "25") { setTD("NVMe MR-IOV – High-Performance Storage Solution for Virtual Environment Deployments|H3 Platform", "Multi-host NVMe SR-IOV, or multi-root SR-IOV (MR-IOV) is the solution aims to improve SSD performance under virtual environments while ensuring high utilization and flexibility for the storage resources. H3 Platform's proposed MR-IOV solution extends the application of SR-IOV."); } else if (para_id == "50") { setTD("【PCIe Gen 5 NVMe chassis】PCIe Gen 5 NVMe MRIOV Solution for Storage Scalability|H3 Platform", "NVMe, a new generation of high-speed storage interface, has higher bandwidth and lower latency than the traditional SATA interface. NVMe Multi-Root IO Virtualization technology (NVMe MR-IOV) further scales up the NVMe resources to realize mass storage sharing and virtualization by allowing multiple virtual machines to visit the same pool of NVMe devices at the same time."); } else { setTD(strT); } setInternalLink(document.querySelector("div.editor-content"), { href: "/product-list/17", anchor: "NVMe MR-IOV Solution" }, { href: "/product", anchor: "Composable NVMe SSD" }); setArticleSchema(); document.querySelectorAll("ul.breadcrumb a")[1].href = "https://www.h3platform.com/blog-list?category=11"; document.querySelectorAll("ul.breadcrumb a")[2].innerHTML = document.querySelector(".title-container h1").innerText; document.querySelectorAll("ul.breadcrumb a")[2].href = url_now; document.querySelectorAll("ul.breadcrumb a")[2].style.color = "#808285"; } else if (blogNum == "2") { setTD(strT); setInternalLink(document.querySelector("div.editor-content"), { href: "/product-list/17", anchor: "NVMe MR-IOV Solution" }, { href: "/product", anchor: "Composable NVMe SSD" }); setArticleSchema(); document.querySelectorAll("ul.breadcrumb a")[1].href = "https://www.h3platform.com/blog-list?category=12"; document.querySelectorAll("ul.breadcrumb a")[2].innerHTML = document.querySelector(".title-container h1").innerText; document.querySelectorAll("ul.breadcrumb a")[2].href = url_now; document.querySelectorAll("ul.breadcrumb a")[2].style.color = "#808285"; } else if (blogNum == "3") { if (para_id == "73") { setTD("Composable Memory System: 210M IOPS, Reduce Bottlenecks|H3 Platform", "Composable memory systems deliver up to 210 million IOPS and remove memory bottlenecks using CXL. Features include dynamic memory pooling, real-time allocation, and improved resource use—helping data centers scale faster while reducing TCO."); } else if (para_id == "72") { setTD("CXL 2.0 Memory Pooling Breakthrough|Four Servers Sharing 2TB Achieve 210M IOPS and 120GB/s Bandwidth|H3 Platform", "Discover H3 Platform's latest advancement in CXL 2.0 memory pooling and memory sharing technology, enabling four servers to share 2TB of memory. Key highlights include achieving 210 million IOPS and 120GB/s bandwidth, significantly enhancing data access speeds and system performance. Explore the detailed test environment, methodologies, and results that showcase this innovative leap in server memory management."); } else if (para_id == "68") { setTD("What is CXL Memory Sharing? Unlocking Shared Memory for AI and HPC|H3 Platform", "Learn how CXL memory sharing is revolutionizing computing with enhanced scalability and efficiency. This blog dives into CXL shared memory, its applications in AI and HPC, and how it transforms disaggregated memory architecture. Explore CXL technologies, protocols, and their role in creating resilient memory management systems for distributed environments. Discover why CXL memory is the future of high-performance computing and data processing."); document.querySelector("main#blog-content img.cover").alt = document.querySelector("div.title-container h1").textContent; } else { setTD(strT); } setInternalLink(document.querySelector("div.editor-content"), { href: "/product-list/18", anchor: "CXL Memory Pooling Solution" }, { href: "/blog-detail/68", anchor: "CXL Memory Sharing Architecture" }); setArticleSchema(); document.querySelectorAll("ul.breadcrumb a")[1].href = "https://www.h3platform.com/blog-list?category=14"; document.querySelectorAll("ul.breadcrumb a")[2].innerHTML = document.querySelector(".title-container h1").innerText; document.querySelectorAll("ul.breadcrumb a")[2].href = url_now; document.querySelectorAll("ul.breadcrumb a")[2].style.color = "#808285"; } else if (blogNum == "4") { // 2025-1208 setTD(strT); /* setInternalLink(document.querySelector("div.editor-content"), { href: "/blog-detail/77", anchor: "AI Storage Fundamentals" }); */ setArticleSchema(); document.querySelectorAll("ul.breadcrumb a")[1].href = "https://www.h3platform.com/blog-list?category=15"; document.querySelectorAll("ul.breadcrumb a")[2].innerHTML = document.querySelector(".title-container h1").innerText; document.querySelectorAll("ul.breadcrumb a")[2].href = url_now; document.querySelectorAll("ul.breadcrumb a")[2].style.color = "#808285"; if (para_id == "77") { setFAQSchema(); } } var breads = [{ href: "/", anchor: "H3 Platform" }, { href: "/blog-list", anchor: "Blog" }, { href: url_now, anchor: document.querySelector(".title-container h1").innerText }]; setBreadCrumbSchema(breads); setSocialMediaMeta({ cond: "meta[property='og:title']", cont: strT }, { cond: "meta[property='og:url']", cont: url_now }, { cond: "meta[property='og:description']", cont: strD }); createTag("meta", { name: "thumbnail", content: document.querySelector("img.cover").src }); function checkData(obj) { for (var i = 0; i < obj.group.length; i++) { if (obj.group[i].blogID.includes(para_id)) { return i; } } } function setTD() { var metaTitle = document.querySelector("title"); var metaDes = document.querySelector("meta[name='description']"); if (arguments.length > 1) { if (!metaDes) { var des = document.createElement("meta"); des.name = "description"; document.getElementsByTagName("head")[0].appendChild(des); des.content = arguments[1]; } else { metaDes.content = arguments[1]; } metaTitle.innerHTML = arguments[0]; } else { metaTitle.innerHTML = arguments[0]; } } function createDetailContent(target, id, content) { var real_id = "jsContent" + id; target.innerHTML = '' + target.textContent + ''; var tag_article = document.createElement("article"); tag_article.style.display = "none"; tag_article.style.textAlign = "center"; tag_article.style.marginBottom = "1em"; tag_article.id = real_id; tag_article.innerHTML = content; target.parentNode.insertBefore(tag_article, target.nextElementSibling); } function show(id) { var t = document.querySelector("article#" + id); t.style.display = (t.style.display == "none") ? "" : "none"; } function addSchema(schema) { var scriptJSON = document.createElement("script"); scriptJSON.type = 'application/ld+json'; scriptJSON.innerHTML = JSON.stringify(schema); document.getElementsByTagName("head")[0].appendChild(scriptJSON); } function extend(obj, src) { for (var key in src) { if (src.hasOwnProperty(key)) obj[key] = src[key]; } } function setBreadCrumbSchema(breadContent) { var schemaData_bread = { "@context": "http://schema.org", "@type": "BreadcrumbList", "itemListElement": [] }; var itemListElement = []; for (var i = 0; i < breadContent.length; i++) { var item = { "@type": "ListItem", "position": i + 1, "item": { "@id": breadContent[i].href, "name": breadContent[i].anchor } }; itemListElement.push(item); } extend(schemaData_bread.itemListElement, itemListElement); addSchema(schemaData_bread); } function setSocialMediaMeta() { for (var i = 0; i < arguments.length; i++) { document.querySelector(arguments[i].cond).content = arguments[i].cont; } } function createTag(tagName) { var tag_head = document.getElementsByTagName("head")[0]; var tag = document.createElement(tagName); for (var i = 1; i < arguments.length; i++) { for (attr in arguments[i]) { tag.setAttribute(attr, arguments[i][attr]); } } tag_head.appendChild(tag); } function setInternalLink(target) { var tagDiv = document.createElement("div"); tagDiv.style.marginTop = "2.5em"; tagDiv.style.textAlign = "left"; tagDiv.style.color = "#231F20"; var strLink = ""; for (var i = 1; i < arguments.length; i++) { strLink += '' + arguments[i].anchor + '|'; } tagDiv.innerHTML = 'Product Info:' + strLink.substring(0, strLink.length - 1); target.appendChild(tagDiv); } function count_url(url) { var url_to_id = 0; for (var i = 0; i < url.length; i++) { url_to_id += url.charCodeAt(i); } return url_to_id; } function getParameter(name, url) { name = name.replace(/[\[\]]/g, "\\$&"); var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"); var results = regex.exec(url); if (!results) { return null; } if (!results[2]) { return ' '; } return decodeURIComponent(results[2].replace(/\+/g, " ")); } function setCanonical(url_path){ var canonical_check = document.querySelector("link[rel=canonical]"); if(!canonical_check){ var link_seo = document.createElement("link"); link_seo.rel = "canonical"; link_seo.href = url_path; var head_place = document.getElementsByTagName("head")[0]; head_place.appendChild(link_seo); } else{ canonical_check.href = url_path; } } function setArticleSchema() { var imgElem = document.querySelector("div.editor-content img"); var imgUrl = imgElem ? imgElem.src : "https://www.h3platform.com/img/blog/blog-banner.jpg"; var timeText = document.querySelector("time").textContent.trim(); var PublishDate = formatDateToISO(timeText); var schemaData_Article = { "@context": "https://schema.org", "@type": "Article", "headline": document.querySelector(".title-container h1").innerText, "image": imgUrl, "datePublished": PublishDate, "author": { "@type": "Organization", "name": "H3 Platform", "url": "https://www.h3platform.com/about" }, "publisher": { "@type": "Organization", "name": "H3 Platform", "logo": { "@type": "ImageObject", "url": "https://www.h3platform.com/img/logo.png" } }, "description": document.querySelector("div.editor-content").innerText.substring(0, 300) + " ..." }; addSchema(schemaData_Article); } function formatDateToISO(timeText) { var dateObj = new Date(timeText); var yyyy = dateObj.getFullYear(); var mm = String(dateObj.getMonth() + 1).padStart(2, '0'); var dd = String(dateObj.getDate()).padStart(2, '0'); var fixedTime = "09:00:00"; var timezone = "+08:00"; return `${yyyy}-${mm}-${dd}T${fixedTime}${timezone}`; } function setFAQSchema() { var schemaData_FAQ = { "@context": "http://schema.org", "@type": "FAQPage", "mainEntity": [] }; var questionList = []; for (var i = 0; i < document.querySelectorAll(".FAQ_Schema_Q").length; i++) { var item = { "@type": "Question", "name": document.querySelectorAll(".FAQ_Schema_Q")[i].textContent.trim(), "acceptedAnswer": { "@type": "Answer", "text": document.querySelectorAll(".FAQ_Schema_A")[i].textContent.trim() } }; questionList.push(item); } extend(schemaData_FAQ.mainEntity, questionList); addSchema(schemaData_FAQ); }