Lao Huang releases the strongest AIGC chip! The memory capacity has increased by nearly 50%, and any large model can be run. "The iPhone with generative AI is here"...

Cressy Xiaoxiao sent from Aufei Temple
Qubit | Public Account QbitAI

Here he is, Lao Huang is here with the "Most Powerful Generative AI Processor" and a series of major updates!

224cd6cad663c226db8ea12fc066fe5d.gif

At SIGGRAPH, the top meeting of computer graphics, Lao Huang announced NVIDIA's latest super chip, NVIDIA DGX GH200 Grace Hopper.

This chip is equipped with the fastest memory in the world . Not only the bandwidth is 5TB per second, but the memory capacity has increased by nearly 50% to 141GB. "Any large language model can run."

At the same time, Nvidia also announced its cooperation with Hugging Face——

In the future, on the Hugging Face platform, there is no need to download the ML model and run it by yourself. It only takes a few simple steps to run the large model on the notebook, which has the taste of Colab (I don’t know if there is a free version).

As for software updates, it’s all about AI between the lines .

Not only has a series of popular AI tools been integrated into the Omniverse platform, but many new software are also built based on large models, such as ChatUSD, which can help developers write code.

This is also the first time in five years that Lao Huang once again appeared on the stage of SIGGRAPH. At the meeting, he declared confidently:

The "iPhone moment" of generative artificial intelligence has arrived.

Some netizens sighed after watching the press conference:

Nvidia is unmatched in terms of AI hardware.

fb7eb4430382d63ba79a33284b05d819.png

The "most powerful supercomputing" composed of new chips is coming

The first and most eye-catching thing in this press conference is "the most powerful supercomputer".

The supercomputer is made up of 256 pieces of DGX GH200 Grace Hopper (DGX GH200 for short).

In Lao Huang's words, this "behemoth" is tailor-made for the AIGC era .

Its computing power and memory capacity have reached 1E(10^15)FLOPS and 144TB respectively .

The picture below shows its true size (the dark shadow in the middle is the old yellow).

8094e20da5de19366adda1ef7badfcb4.png

Not only is the performance excellent, but the comparison shows that the performance-to-price ratio is simply better than the CPU .

What can you get by spending the same $100 million on CPU and GPU?

For CPU, you can buy 8800 x86 architecture products.

The total of nearly 9,000 CPUs can only drive an AI program of the scale of LLaMA 2 and SDXL.

The power... is 5 megawatts, that is, 5000 kWh per hour.

ba6e3c9e2cece4a38b747e2af53b8660.png

If it is replaced with a GPU, it is 2500 DGX GH200.

The approximate scale of AI programs that can be driven has increased to 12 at once, but the power has been reduced to 3 megawatts.

7dfebf9586059b3f995b330a622d2b82.png

On average, a single program requires 210 DGX GH200s, the price is 8 million US dollars, and the power is 0.26 megawatts.

3a6e54498cbd71ddb049c2192724b0f5.png

The DGX GH200 that makes up this "most powerful supercomputer" is also at the king level, known as "the most powerful generative AI processor".

59775e1e2e0a0f287fdc545f967bc829.png

DGX GH200 consists of Grace CPU and Hopper GPU.

Among them, the Grace CPU contains 72 cores, and the latter has a computing power of 4P (10^12) FLOPS and a 500GB LPDDR5X.

In addition, Hynix's "fastest memory" HBM3e is also added to DGX GH200.

Its capacity is 141GB , and its bandwidth is as high as 5TB per second , which are 1.7 times and 1.55 times that of H100 respectively .

(Good guy, H100 is only good for baseline)

b5fbcf39dc8bb017019e65555274a01f.png

In DGX GH200, the connection between CPU and GPU is 7 times faster than Gen5 PCIe .

732add6f33efa146b202263391392bdb.gif

The process from a single DGX GH200 to the entire supercomputer is a "stack".

This is thanks to its multi-GPU high-speed connection capability.

The performance of the doublet DGX GH200 has almost no loss, which is directly twice that of the monomer.

ab2012d21b1f09f5c8c8742c6c8a1bd6.png

The duplex DGX GH200, BlueField-3 DPU and ConnectX-7 network card form a "computing box".

039075fcc7079a1716aef976f5582a82.gif

Through NVLink, 8 such "computing boxes" are connected at high speed, and the DGX building blocks are obtained, with a total memory of 4.6TB.

33e06a671f5341cd25608e582c74bb99.gif

Such building blocks can be combined into a new computing box, and eventually expanded into a 256 GPU working cluster Superpod.

The high-speed connection capability of NVLink allows these 256 GPUs to work "like one piece".

e0a97d47806efbacc37ee9edd3fdae32.gif

So far, the scale of graphics card supercomputing has reached the level shown by Lao Huang at the beginning of this section.

But it's not over yet -- the Superpods can still connect.

With the help of the high-speed and low-latency Quantum-2 Infiniband platform, the scale of supercomputing can continue to expand ...

3cce2e29bc1a73f3a07eef5854685c0b.gif

Speaking of this, Lao Huang also joked:

If one day you find it when you buy a graphics card from (an e-commerce platform), don't be surprised!

In short, according to different needs, using DGX GH200 will be able to build supercomputers of different scales that adapt to the AIGC era.

It is expected that DGX GH200 will be put into production in the second quarter of next year (2024).

Also issued 3 new RTX professional graphics cards

In addition to the "Most Generative AI Processor", Nvidia has also launched 3 new workstation graphics cards this time:

RTX 5000, RTX 4500, and RTX 4000.

These graphics cards are all based on the Ada Lovelace architecture design, and the current parameters have been synchronized with Nvidia's official website:

bdf91327e8b4bbd2aadc43564635512c.png

Of course, professional graphics cards are also more expensive.

Among them, the RTX 5000 is priced at US$4,000 (about 28,700 yuan), the RTX 4500 is priced at US$2,250 (about 16,000 yuan), and the RTX 4000 is priced at US$1,250 (about 8,987 yuan).

When Lao Huang also released the RTX graphics card, he said the classic saying again:

The more you buy, the more you save (the more you buy, the more you save).

3423b278082c0fb7e85ba4352c41c1bf.png

As for the RTX 6000 Ada graphics card released in September last year, a new workstation design was also introduced at this conference: 4 pieces stacked up to create a top-level "Jackbox".

A single RTX workstation designed in this way can provide AI performance of 5828 TFLOPS and 192GB of GPU memory.

In addition, Lao Huang also announced a new OVX server equipped with L40S Ada GPU at this conference , dedicated to the data center.

ac261be25927abeccb1ed984852bb656.png

Each server is equipped with 8 L40S Ada GPUs, and each L40S contains up to 18176 CUDA cores , which can provide nearly 5 times the single-precision floating-point (FP32) performance of A100.

Compared with A100, the performance of L40S fine-tune large model is improved by about 1.7 times.

(Yes, A100 has been used by Lao Huang to compare new hardware)

Specifically, it only takes a few hours to fine-tune a large model with tens of billions of parameters on this OVX server ;

Like the GPT-3 large model with 40 billion parameters , it only takes 7 hours to fine-tune 860M tokens .

In terms of rendering, the performance of the L40S is also good, equipped with 142 third-generation RT cores, which can provide ray tracing performance of 212 teraflops.

The L40S is expected to be available this fall.

The AIGC version of Colab is here, and the notebook runs a large model

Not only has a series of "blockbusters" been thrown out one after another on the hardware, but Nvidia has also released a variety of new products in terms of software.

The first is to cooperate with HuggingFace to integrate NVIDIA DGX Cloud AI into it.

On the HF page, the model can be adjusted and run on the cloud with one click .

d812ea02f61c615fad0f77e00e824cae.png

Nvidia scientist Fan Linxi (Jim Fan) excitedly announced the news, and also revealed that each node used in it is 8 H100 or A100.

3c594f34e30a8fd0c461d81c2fdf38e5.png

In addition to cooperating with HF, Nvidia has also launched its own Workbench platform.

By connecting to cloud services, large models can be run with a laptop .

A demonstration video of running SDXL through Workbench was also played on site.

a1c64e91f53510ed14d99bc932da9ede.gif

In Jupyter, the presenter asked SDXL to draw a "toy old yellow".

3ecf8f0af74548bea06eafeab260b66d.gif

At this time, SDXL didn't know what a "toy old yellow" was.

So the demonstrator fine-tuned the model with 8 pictures on the spot.

2549a6c7452fb2ac7353f58c038bc72b.png

Does the repainted work after fine-tuning have that smell?

7218f0b2afbb55648cb09d746f5636b7.png

In addition to the above two large model running tools, NVIDIA also launched the latest version of the enterprise software platform NVIDIA AI enterprise 4.0.

The number of software packages has reached 4,500, and there are tens of thousands of related dependencies, and it is safe and reliable.

2f7407ac458d1f7d39a2f019864e9903.png

NVIDIA partners such as Google, Microsoft, Amazon, and Oracle will integrate this service in their own cloud platforms.

"Humans will become the new programming language"

In addition, Nvidia's computer graphics and simulation platform Omniverse has also announced a series of new developments.

On the one hand, more AI tools can be called directly in Omniverse.

A series of popular AI tools, including Convai, a conversational AI character creation tool, Move AI, a high-fidelity AI motion capture tool, and CGWonder Dynamics, a low-cost AI CG tool, have now been integrated into Omniverse through OpenUSD.

Even Adobe plans to provide Adobe Firefly as an API in Omniverse (that is, it is estimated to charge).

c51900ccfa02c72cd8883afe005fae49.png

On the other hand, Nvidia also combined generative AI technology with OpenUSD to launch some useful AI tools.

For example, ChatUSD is a large model Copilot based on the NVIDIA Nemo framework. It can not only answer developers' questions about USD, but also help generate Python-USD codes.

4437da4571cca96207a5517e9466c3d7.png

Another example is DeepSearch, which is also a new tool based on large models. Based on text or image input, it can quickly perform 3D semantic search on the database.

At this conference, Lao Huang first reviewed the "correct decision" he made in the past - reshaping CG with AI and reinventing GPU for AI.

Then, he made a bold outlook on the future development of the AI ​​industry:

In the future, there will be a big language model in front of almost everything.

"Human" will become a new programming language.

Taking the factory as an example, Lao Huang believes that the factory of the future will be "dominated" by software and robots.

Products like cars are themselves robots, so the factory that produces cars will present a scene where robots make robots.

It seems that Nvidia, which is rapidly rising by the wind of large models, really wants to ALL IN generative AI this time.

参考链接:
[1]https://www.anandtech.com/show/20001/nvidia-unveils-gh200-grace-hopper-gpu-with-hbm3e-memory
[2]https://twitter.com/DrJimFan/status/1688954935248027648
[3]https://tehcrunch.com/2023/08/08/nvidia-ceo-we-bet-the-farm-on-ai-and-no-one-knew-it
[4]https://www.youtube.com/watch?v=3qSQjRaseos

Guess you like

Origin blog.csdn.net/QbitAI/article/details/132200227