What are the biggest cloud providers in Asia doing to meet the rising demand for AI inference? Omdia's latest research offers an in-depth look at the evolving challenges of AI inference operations, the key trade-offs between throughput, latency, and support for diverse AI models, and the possible solutions. The report provides detailed coverage of companies such as Huawei, Baidu, Alibaba, ByteDance, Tencent, NAVER, and SK Telecom Enterprise. It examines which GPUs, AI accelerators, and AI-optimized CPUs these companies offer, their pricing, the stockpile of NVIDIA GPUs, their AI service portfolios, and the current status of their own AI models and custom chip projects.
Despite heavy stockpiling of NVIDIA H800 and H20 GPUs during 2024 and early 2025, prior to the imposition of US export controls, these high-performance chips are difficult to find in Chinese cloud services, suggesting they are primarily used for the hyperscalers' own model development projects. Similarly, there are relatively few options that use any of the Chinese AI chip projects; exceptions include Baidu's on-premises cloud products and some Huawei Cloud services, although they remain limited. Chinese hyperscale companies are well advanced in adopting best practices such as decoupled prefill and generation and publish seminal research in fundamental AI; however, the research papers often mention that the training runs are carried out using Western GPUs, with a few notable exceptions.
"The real triumph in Chinese semiconductors has been CPUs rather than accelerators," says Omdia Principal Analyst and author of the report, Alexander Harrowell. "Chinese Arm-based CPUs are clearly in production at scale and are usually optimized for parallel workloads in a way like Amazon Web Services' Graviton series. Products such as Alibaba's YiTian 710 offer an economically attractive solution for serving the current generation of small AI models such as Alibaba Qwen3 in the enterprise, where the user base is relatively small and workload diversity is high."
If modern GPUs are required, the strongest offering Omdia found was the GPU-as-a-service product SK Telecom is building in partnership with Lambda Labs. Omdia observed significant interest in moving Chinese workloads outside the great firewall in hopes of accessing modern GPUs and potentially additional training data. Among other important findings, nearly all companies now offer models-as-a-service platforms that enable fine-tuning and other customizations, making this one of the most common ways for enterprises to access AI capabilities. Chinese hyperscalers are especially interested in supporting AI applications at the edge. For example, ByteDance, offers a pre-packaged solution to monitor restaurant kitchens and report whether chefs are wearing their hats.
ABOUT OMDIA
Omdia, part of Informa TechTarget, Inc. (Nasdaq: TTGT), is a technology research and advisory group. Our deep knowledge of tech markets grounded in real conversations with industry leaders and hundreds of thousands of data points, make our market intelligence our clients' strategic advantage. From R&D to ROI, we identify the greatest opportunities and move the industry forward.
View source version on businesswire.com: https://www.businesswire.com/news/home/20250723727908/en/
Contacts:
Fasiha Khan: fasiha.khan@omdia.com