In the early morning of May 14th, OpenAI launched its new generation flagship generation model GPT-4o and desktop App at the much-anticipated "Spring New Product Launch", and demonstrated a series of new capabilities. During the live broadcast, it was mentioned that GPT-4o would be provided to users for free. The editor logged in to his account as soon as possible in the morning, but did not see it. Guessing that this model is still in grayscale, Krypton Gold turned into an AI evaluation blogger and implemented the GPT-4o model officially promoted by OpenAI! The occasion! Test! Check!
First of all, the official OpenAI blog mentioned that GPT-4o is particularly good at visual and audio understanding compared to existing models.
The editor watched the OpenAI demo video. The OpenAI staff interacted with GPT-4o through video conversations like friends, and I was very moved! Eager to give it a try!
But!!! I regret to tell you that the video interaction function is not currently open to ordinary users. We can only interact with ChatGPT by uploading pictures and files.
In addition, the official real-time voice translation function also attracted the attention of people who eat melons. OpenAI responded that the mobile phone can be used as a translator to translate nearly 20 commonly used languages.
The editor tried it and found that real-time voice translation is not yet available... After voice interaction with ChatGPT, there is still a few seconds of waiting time.
OpenAI also stated in the official blog that "We plan to provide GPT-4o's new audio and video capabilities to a small number of trusted partners through API in the next few weeks." In addition to these, everyone is looking forward to it, but there is no way To experience the function, the official blog post also posted a series of Vincent pictures and image, voice, and video recognition capabilities. Next, the editor will show these capabilities! open! Measurement! Comment! We copied the input in the official blog as a prompt to generate, and compared our own generated results with the official generated diagram for your reference~
Comic storyboard: Robot’s writing bottleneck
This effect display can, on the one hand, reflect GPT-4o's powerful image generation capabilities, including improvements in copywriting generation on images, and the ability to maintain the consistency of characters when generating multiple images. But the effect...
In the first picture, you can see that there are still typos in the text generated by the self-test, and the handwriting is blurred.
In the second picture, the robot's hand has changed significantly and is not consistent, and the paper has also changed.
The third picture is basically passed, but the text on the paper is completely different from the first two pictures...
Comic Storyboard: The Story of Postman Sally
very good! GPT-4o generated a Japanese comic-style postman beauty, which is even more beautiful than the official posted picture.
Wait, why has the style of painting changed? How come the Japanese comics have turned into puppets, and the perspective is not right.
The third picture has a different style. Although the individual pictures and text correspond well, it is difficult to tell a coherent story together...
Comic avatar
The next feature is my favorite, and it is also the best performance of GPT-4o. Upload a photo to have a comic avatar designed for you, and the background can also be customized.
This is the original picture, Alex Nichol, OpenAI’s technical guy
This is a comic avatar generated by GPT-4o. Although the self-test is not as realistic as the official one, it also restores the basic characteristics.
artistic font
The effect is amazing, even better than the official picture!
But why are there fewer and fewer letters?
3D renderings
The aesthetics are good enough, but can the logo still be used if it looks like this?
creative typography
The handwriting is quite beautiful, but the accuracy of the text is still a bit poor...
Character emotion recognition
The editor uploaded a photo of a person with rich emotions, and GPT-4o recognized it very accurately, and also made up a story.
Conference recording recognition
The editor uploaded a multi-person conference recording and asked it how many people were in this recording. GPT-4o gave an answer through audio track analysis, which was a bit outrageous...
Judging from the overall actual experience, the GPT-4o currently available to ordinary users is not as easy to use as advertised. This release is more like a hasty PR behavior without much sincerity. The editor does not doubt that the video released by OpenAI is suspected of editing the video like tomorrow's protagonist Google, but obviously the GPT-4o used in the mobile phone of the OpenAI employee in the video is different from the one I use now. As for when it will be the same internally and externally. , we can only look forward to it.
Welcome to pay attention
"Trusted AI Progress" The official account is dedicated to the dissemination of the latest trusted artificial intelligence technology and the cultivation of open source technology, covering large-scale graph learning, causal reasoning, knowledge graphs, large models and other technical fields. Welcome to scan the QR code to follow and unlock more AI information~