Tencent released a super powerful computing cluster | Helping enterprises to train large AI models on the cloud

On April 13, Tencent reported that for large model training, Tencent released a super computing power cluster.

This domestic ultra-powerful computing cluster is the new generation of HCC (High-Performance Computing Cluster) newly released by Tencent Cloud for large-scale model training. The overall performance has increased by 3 times compared with the past.

It is equipped with NVIDIA H800 Tensor Core GPU, which can provide high-performance, high-bandwidth, and low-latency intelligent computing support.

The current hot artificial intelligence large-scale model training is inseparable from high-performance computing power clusters.
We are very happy to share this good news with you as soon as possible.

What is computing cluster

General computating is performed by a computing chip.

However, when encountering massive calculations, a single chip cannot support it. It is necessary to connect thousands of servers through the network to form a large-scale computing power cluster, and work together to become higher and stronger.

A large artificial intelligence model usually needs to be trained with trillions of words, and the number of parameters has also “surge” to trillions. At this time, only high-performance computing clusters can handle it.

What does ultra-powerful computing cluster depend on?

The “powerfulness” of a computing power cluster is jointly determined by the computing power of a single machine, network, and storage. Like a firm wooden barrel, neither is indispensable.

Tencent Cloud’s next-generation cluster can provide high-performance, high-bandwidth, and low-latency intelligent computing support for large-scale model training through collaborative optimization of stand-alone computing power, network architecture, and storage performance.

In general, it has the following characteristics:

–In terms of computing, the performance is powerful——

On the basis of the maximum optimization of single-point computing performance, we also combine different types of chips, GPU+CPU, so that each chip can go to the most appropriate place and do what it is best at.

–In terms of network, the bandwidth is sufficient——

GPUs are good at parallel computing and can do multiple tasks at once. Our self-developed Xingmai high-performance network allows tens of thousands of GPUs to “ventilate” each other, and the information transmission is fast without traffic jams. A beautiful cooperative battle has been fought, and the training efficiency of large-scale model clusters has increased by 20%.

–In terms of storage, reading is fast——

When training a large model, thousands of servers will read a batch of data sets at the same time. If the loading time is too long, it will also become a shortcoming of the wooden barrel. Our latest self-developed storage architecture classifies data into different “containers” for different scenarios, making reading faster and more efficient.

Helping enterprises to train their large models on the cloud

With the sharp increase in the demand for computing power, the price of purchasing GPUs by yourself is expensive, even if you have money, you can’t buy them, which brings great pressure to start-ups and small and medium-sized enterprises. Our new generation of HCC clusters can help train large models on the cloud, hoping to relieve their pressure.

Our self-developed training framework, AngelPTM, internally supports the training of the Tencent Hunyuan large model, and has also provided external services through Tencent Cloud. In October last year, it completed the first trillion-parameter large-scale model training and shortened the training time by 80%.

We have large model capabilities and toolboxes that can help companies fine-tune training according to specific scenarios, improve production efficiency, and quickly create and deploy AI applications.

Our self-developed chips have been mass-produced, including the Zixiao chip for AI reasoning. It adopts a self-developed storage-computing architecture and a self-developed acceleration module, which can provide up to 3 times the computation acceleration performance and more than 45% overall cost savings.

In general, we are taking the new generation of HCC as a benchmark, based on self-developed chips, self-developed servers and other methods, integrating software and hardware, to create a high-performance AIGC-oriented intelligent computing network, and continue to accelerate the innovation of the whole society on the cloud.