In a significant move that could reshape the economics of artificial intelligence, Alibaba Group Holding has unveiled a groundbreaking computing solution that it claims reduces the need for expensive Nvidia graphics processing units (GPUs) by a staggering 82%.
The new system, dubbed Aegaeon, has the potential to dramatically lower the costs associated with running large-scale AI models, a major bottleneck for many companies in the rapidly growing AI industry.
The High Cost of Idle GPUs
Cloud service providers like Alibaba Cloud and ByteDance’s Volcano Engine often run thousands of AI models concurrently. However, only a small fraction of these models are in high demand at any given time. This leads to a massive inefficiency, with a large number of GPUs sitting idle while allocated to models that are only used sporadically. The original research paper, presented at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea, highlighted this issue, noting that 17.7% of GPUs were allocated to serve a mere 1.35% of requests in Alibaba Cloud’s marketplace.
Aegaeon: A More Efficient Approach
Alibaba’s Aegaeon system tackles this problem head-on with a novel approach to GPU resource management. The system utilizes “auto-scaling” at the token level, the fundamental units of data processed by AI systems. This allows a single GPU to dynamically switch between serving different models, even in the middle of generating a response. This fine-grained resource allocation ensures that GPU power is always directed where it’s needed most, minimizing waste.
Impressive Results
During a three-month beta test in Alibaba Cloud’s model marketplace, Aegaeon demonstrated remarkable results. The system reduced the number of Nvidia H20 GPUs required to serve dozens of AI models (with up to 72 billion parameters) from 1,192 to just 213. This represents an 82% reduction in the number of GPUs needed, a massive cost saving.
Furthermore, Aegaeon was able to support up to seven different models on a single GPU, a significant improvement over existing systems that can typically only handle two or three. The system also slashed the latency associated with switching between models by an impressive 97%.
Navigating Geopolitical Tensions
This technological breakthrough comes at a time of heightened trade tensions between the US and China, with the US imposing strict export controls on advanced semiconductor technology. By optimizing software to reduce reliance on US-made hardware, Chinese tech giants like Alibaba are finding innovative ways to navigate these restrictions and maintain their competitive edge in the global AI race.
This development from Alibaba Cloud is a clear indication that the future of AI is not just about building more powerful hardware, but also about creating smarter, more efficient software to make the most of the resources we already have.




Estadunidenses estão PHodidos.
Vai China…