Comprehensive evaluation of generative 3D large models in the AI era - the eve of the "ChatGPT moment"

In all my past articles, I have always divided AI into four modalities for classification:

AI text (large language model), AI drawing, AI sound, AI video

In my recent exchanges and interviews, there is an existence outside these four modes that has been mentioned repeatedly.

AI 3D。

On December 20, this Wednesday night, I was being interviewed by a friend and happily chatted for an hour. At the end, he suddenly asked a question that was not included in the outline: "What do you think of 3D in the AI ​​era?"

To be honest, I was a little confused at the time. I had never seriously thought about this issue, so I just talked about my understanding and put it off.

However, this is not the first person to communicate with me about this topic. In the past month, AI 3D has been mentioned N times in various information channels of mine.

Therefore, I also decided to write this article to talk about the fifth major modality in my mind: AI 3D, and the current status of this field.

Without further ado, let’s get started.

Currently, there are about five mainstream players in the field of AI 3D: Tripo, Meshy, sudoAI, CSM, and LumaAI.

picture

CSM and Luma are very old companies. Luma used to mainly do real-life scanning, and I have been playing with it. Some time ago, they launched a product Genie of Vincent 3D. It is still parasitic on Discord and does not support Tusheng 3D yet; CSM I made a real-time drawing conversion to 3D, but it does not support Vincent 3D.

Meshy also started working relatively early. I remember the product was released in July or August. Tripo and sudo are relatively new, especially Tripo, which was only released a few days ago on December 21st.

When talking about AI 3D products, the core functions and pain points that cannot be avoided are naturally modeling.

Let me briefly talk about the 3D workflow to give everyone an idea. Probably concept design - 3D modeling - texture mapping - bone rigging - animation - lighting - rendering - compositing.

The film and television special effects you see, or the scenes in games, all need to be modeled, textured and then rendered. The first finished modeling product is a blank model, which looks roughly like this.

picture

Only after you have the model can you do all the rest.

Therefore, modeling is very important, but it is also the most time-consuming, and in many cases it can even take up 30% to 50% of the total time. In the 3D field, there is nothing more important, more boring, and more in need of AI optimization than modeling.

Several products have similar functions in terms of AI generated modeling, such as Vincent 3D and Tusheng 3D.

Vincent 3D and Tusheng 3D are actually very easy to understand. They have the same concept as AI video, except that in AI video, a 4s clip is generated using text or pictures, while in AI 3D, a model is generated.

The standard for measuring everyone is very simple: the quality and accuracy of the generated model .

Generally speaking, the one we use most is Tusheng 3D.

So I first ran a picture using MJ V6:

Basketball game assets, blender 3d model, obj fbx glb 3d model, default pose, PNG image with transparent background

Basketball game assets, Blender 3D model, obj fbx glb 3d model, default pose, PNG image with transparent background

picture

(PS: I really didn’t choose to do basketball first because of Brother Chicken)

Then I threw this picture into Tripo, Meshy, sudo, and CSM. Because luma does not support Tusheng 3D now, I will not participate in the comparison of Tusheng 3D.

To be honest, my expectations for AI 3D are actually not high, so I chose a very simple thing like basketball at the beginning. In the end, except for Tripo, the other three were really unsatisfactory, and I couldn’t help but CSM. To complain, it takes nearly 2 hours to generate a model. . . . I. . .

I downloaded all the models and rendered them into animated GIFs in Blender. All cameras, HDR, and parameters were unified. You can intuitively feel the comparison of the four products.

picture

It can be seen that only Tripo has truly connected the textures of basketball to become a real basketball. Meshy and sudo obviously saw that the textures were crashing, and the crashes were not ones that could be tolerated but could not be used at all. CSM is also in a mess behind the scenes.

Go to Blender to look at the modeling details.

picture

CSM made a slight shadow of the grooves of the basketball. Tripo and sudo's modeling was quite satisfactory. It was a ball that was not particularly round and had some flaws, but it was usable. Meshy was completely useless.

In the case of basketball, Tripo is far ahead.

Tripo > CSM > sudo > Meshy。

Let’s try a few more examples.

1. Cartoon little dragon man, it’s the Year of the Dragon after all.

picture

Tripo continues to be solid, Meshy's model, has a bunch of holes. . . . The sudo texture is okay, but the modeling of the lower body and the tail structure behind it are completely broken. The moment the CSM rotated, there were two faces, which scared me half to death, but the model structure was okay. . .

Tripo > CSM > sudo > Meshy

2. Sweater. After all, making clothes is an inevitable part of modeling. . .

picture

Tripo's performance is almost perfect, whether it's modeling or textures. If you have to pick a nitpick, it's that there are no two holes in the cuffs (laughs). Meshy's modeling has holes as usual, and I found one of their textures is very The big problem is that the front is always exquisite, but the back is a bit crumbled. There are still holes on both sides of the sudo clothing model, and there are links that should not appear. CSM's textures have the same problem as Meshy, the back and front are very different.

Tripo > CSM > sudo > Meshy

3. A rose. The modeling of flowers is one of the most disgusting. It is basically the most difficult level for today's AI 3D. Roses are used to finish off Tucson 3D.

picture

The front and back models of the Tripo flower have a reasonable structure, but the leaf models are stuck together and collapsed, leaving some strange things. Meshy is still a face-saving project. It looks quite amazing when viewed from the front, but once you turn around, it has holes again. The details on the sudo flower are broken, and the structure of the flower is basically invisible.

As for CSM. . . . . . Don’t ask me really what that thing is, I don’t know, but I know it must not be a flower.

Judging from these four examples, at least in the field of Tusheng 3D, Tripo is leading the way.

Overall Tripo > sudo >  CSM = Meshy.

Let’s look at Vincent 3D again. Vincent 3D is not supported by CSM, but LumaAI’s Genie supports Vincent 3D, so this comparison only compares Tripo, Meshy, sudoAI, and LumaAI.

Vincent 3D really relies on the foundation of the model itself. After all, in Tusheng 3D, the pictures are other people’s drawings, so it shows more of the tolerance or universal ability of large models. What you Tusheng 3D can’t do Okay, there is a reason to say that the image style generated by MJ does not match your 3D model, so the effect is not good. As for Vincent 3D, it depends on your foundation. They are all things in your own system. If you don't do it well, you are really not good at it.

The process of Vincent 3D is a bit like Runway's Vincent video. After giving a prompt, runway will produce 4 first frames, and then you choose which image to use to generate the subsequent video.

Vincent 3D will first use more than ten seconds to generate four rough preview models based on your prompt. You can decide which one to use for subsequent refinement. It probably looks like this.

picture

The front preview model will be rough, but it will allow you to roughly choose the look you want.

I’ll try the first prompt first. After all, it’s Christmas soon, so I’ll give you the whole thing:

spiderman dressed in christmas style with a christmas hat, highest quality (Spider-Man dressed in christmas style, wearing a christmas hat, highest quality)

picture

The effects of both Tripo and Luma are very good. Tripo is more realistic overall, while Luma is more cartoonish. The only flaw of Luma is two inexplicable white spots on the knees. meshy turned into a gourd baby. . . The texture accuracy of sudo is not very good, and there is a bug in the connection between the hats.

Tripo > Luma > sudo > Meshy。

Make another Catwoman, after all, how can we do 3D without beauties:

an anime catgirl

picture

Tripo and Luma are still as stable as old dogs. Meshy, it’s a bit weird. It feels like this texture has no texture at all and looks like paper. . . sudo directly made a pillow. . . Me. . . .

Tripo > Luma > Meshy > sudo

The last case, let’s make a 3D asset for the game, a golden pistol:

golden pistol, unreal engine, highest quality (golden pistol, unreal engine, highest quality)

picture

I won’t comment on the specific details of the pistol, you can see for yourself. Luma and Tripo are still strong. In terms of the details of the muzzle, Luma is more refined than Tripo.

Luma > Tripo > Meshy > sudo

Vincent 3D, currently overall, Tripo and Luma are basically in the lead. In some details, Tripo will be better than Luma.

In terms of Tusheng 3D and Vincent 3D as a whole, Tripo is currently the absolute king.

The website of Tripo is here: https://www.tripo3d.ai/

If you want to experience Luma's Vincent 3D, you can go directly to Discord and search their channel to join the experience.

I don’t recommend you to try the other three, as they don’t make much sense.

But like Tripo and Luma, there are still many flaws, such as the wiring of the model is a bit messy, the facial texture of the character is likely to collapse, the rendering of metal materials is not refined enough, etc.

But I believe that time will solve everything. If you are like Tripo, a first-generation product that has just been released for three days, it is impossible for you to expect it to reach the top in one step, not to mention that the field of AI 3D has just begun to roll out.

At present, the progress of AI 3D, led by Tripo and Luma, is roughly equivalent to Midjourney V2 or V3 of AI drawing, and other companies are still at the V1 level.

The explosion of Midjourney was also marked by V4, which began to subvert the entire industry. Until the V6 a few days ago, it exploded in the audience.

AI 3D, now is the eve of the GPT moment.

The day when the outbreak comes may come sooner than you and I think.

 write at the end 

In 2019, I once made a 3D work to commemorate the departure of one of my gaming partners.

picture

This is what I said at the time:

picture

I spent a whole month of evenings and weekends making this picture.

90% of the models in it were modeled with my own freehand. The workload was very, very painful, and modeling consumed 70% of my overall time.

If I had to do it again, I would definitely not do it again. I don’t want to go through that kind of torture again.

This is just me, an unprofessional designer.

And do you know how many things need to be modeled in games and movies?

"Elden's Circle" is an example. There are hundreds of bosses and countless scenes. There are countless 3D assets in countless scenes, ranging from bosses and castles to weapons, armor, candles, and tables.

With From Software's industry upstream productivity and industrialization level, it took a full five years to take out the old ring.

"Baldur's Gate 3" was developed by a team of 400 people at Larian's peak for 6 years.

"The Wandering Earth 2" has a full production cycle of 3 years.

I also talked to many film and television post-production practitioners about a question. What steps do they need AI to optimize most now? The answer is surprisingly unified:

Modeling.

I am extremely optimistic about AI 3D, not because this field is new, but because this thing can really liberate the productivity of content creators, allowing them to spend more energy on creation and protecting the creation of these creators. energy.

Modeling is only one part of it, there are also AI texture mapping, AI binding bones, AI motion capture and so on.

When AI is used to reshape the entire 3D pipeline and open up the entire process, the efficiency takes off.

And it’s not just professionals in gaming and film and television who need it.

There is another bigger thing. 3D assets are the infrastructure. Without ultra-efficient AI 3D process and AI-assisted construction, this thing is basically difficult to achieve.

This thing is: the Metaverse.

I have never thought that the Metaverse is a piece of cake. It is the future that I firmly believe in, but it is still a bit too far away now because the infrastructure and production capacity cannot keep up, and the world has not been built yet. The Metaverse is a piece of shit.

AI 3D is the best creative engine in the Metaverse.

I have always believed that the content of 3D in the future will be infinitely expanded, and everyone can become a super creator, create a new world like a god, and create your own metaverse.

That day won't be too far away.

Next year, we are expected to witness the accelerated future of AI 3D.

(This article is reproduced from the WeChat public account: Digital Life Kazik, for learning purposes only)

Guess you like

Origin blog.csdn.net/richerg85/article/details/135215456