How does AIGC Vincent graph technology make AI a "photographer"?

Daguerreotype was invented by Frenchmen Louis Daguerre and Joseph Nicefleur in 837.
On August 19, 1839, the French government purchased the patent and declared the invention a "free gift to the world".
In honor of this important date, August 19 has been designated as World Photography Day, which celebrates the art, craft, science and history of photography.

It has been nearly 200 years since the invention of photography technology. Because of this, countless classic photos have been preserved in various eras. They have witnessed the development and changes of human civilization and photography history.

As early as 1825, the Frenchman Niepce invented the "solar etching method" and used his giant obscura to record the world's first photographic work - "The Child Leading the Horse".

 

 

In 1839, Daguerre, the father of the camera, invented daguerreotype photography. His photographic equipment included cameras, developing boxes, chemicals, tools for grinding metal plates, etc., weighing a total of 50 kilograms. His work "Studio A Corner of the "Daguerreotype" is the earliest surviving "Daguerreotype" photo, and it is also the first still life photo in the world.

In 1884, Kodak founder Eastman invented the world's first film negative. In order to promote film, he launched the "Kodak (Kodak) No. 1" camera in 1888, creating the history of the development of small cameras.

 In 1975, Kodak produced the world's first digital camera, marking the advent of digital imaging technology.

 In 2000, Sharp launched the world's first camera phone J-SH04 in Japan's joint operator J-Phone. Today, almost everyone owns a camera phone with more and more professional functions. The threshold is gradually lowered, and people can use cameras to record what they see and feel in life anytime, anywhere.

The development history of photography technology is exactly the evolution process from PGC (Professional Generated Content) to UGC (User Generated Content) now , but the rapid iteration of technology has pushed photography to a new level of AIGC (Artificial Intelligence Generated Content) , forming a huge impetus and influence on the global cultural content production, artistic creation, and design industries.

Founded in 2008, the Sony Photography Awards is the only global photography competition organized by the WPO (World Photography Organization). To a large extent, it represents the authority in the field of photography and leads the development of the photography industry. However, at the 2023 annual competition that ended in April this year, an incident occurred that sparked heated discussions among the public.

The first prize in the Open Creative category of this year's Sony World Photography Awards was won by German artist Boris Eldagsen with "PSEUDOMNESIA | The Electrician". But after winning the award, Boris publicly refused to accept the award on multiple social media platforms. The reason is staggering: this "PSEUDOMNESIA | The Electrician" and even the entire "photo" of "PSEUDOMNESIA" are all generated by AI .

 The topic of AI-generated photography has attracted the attention of global media in a short period of time. Boris said in an interview: "For me, working with an artificial intelligence image generator is a kind of co-creation, and I am the director. This It's not a matter of pressing a button - and it's been done. It's about exploring the intricacies of the process, starting with refining text prompts and then developing a complex workflow and mixing platforms and technologies. The work you create The more process and defined parameters, the more creative parts of you become."

The so-called AI photography uses the technology of Text-to-Image Generation to input text descriptions to generate corresponding pictures. As one of the main directions of AIGC, it has broad application prospects in content production and other fields. . In "AI photography", the human role is like a director, responsible for conveying the "feeling" you want to the director of photography, and then the AI ​​acting as the director of photography will turn the "director's" idea into reality. In 2023, the Lishui Photography Festival (an international photography festival jointly sponsored by the Chinese Photographers Association and the Lishui Municipal People's Government) has established the first AI image art award, and is committed to embracing technology through global collection, selection, exhibitions and forums. , to promote the popularization of AI in the field of Chinese photography art.

 

Stable Diffusion is a cutting-edge image generation method that integrates innovative image generation methods with diffusion algorithms, neural networks and hinting techniques. Through a stable stepwise diffusion process, combined with text hints and fine-tuning techniques, it is able to generate high-quality, creative images. This method has great potential not only in artistic creation, but also in design, media and other fields. With the continuous evolution of technology, Vincent graphs will play an increasingly important role in the field of image generation, bringing more creative inspiration and possibilities to creators.

Image generation has been a hot topic of much concern in the field of computer vision. As an emerging generation method, Vinsen graph adopts the diffusion algorithm as the core idea to generate images steadily and step by step.

Diffusion refers to the process of spontaneous transfer of substances between different regions, which is common in the fields of physics and chemistry. In the image field, the diffusion algorithm realizes the noise addition or denoising process of the image by adding or removing the noise regularly. In the Vinsen diagram, this process is applied to image generation by gradually changing the properties of pixels to generate images relevant to user prompts. This step-by-step generation process guarantees image stability and creativity, and provides a reliable framework for generating high-quality images.

The core network structure of the Vincent graph is Unet, a powerful neural network architecture commonly used in image segmentation and processing tasks. The key idea of ​​Unet is to decompose and reconstruct the input image step by step, so as to achieve image denoising and restoration. In Vincent graphs, Unet is used to combine user's textual cues with image features to generate images in a stable manner. This process is not only technically challenging, but also involves modeling the relationship between text and images, enabling creative image generation.

In order to improve the quality and relevance of generated images, Vincent graphs introduce hinting techniques, mainly including CLIP (Contrastive Language–Image Pretraining) and other enhancement methods. CLIP is a text encoding algorithm that converts text prompts into word feature vectors (Embedding). These vectors capture the semantics and features of the text, enabling the model to understand user cues and incorporate them into the image generation process. This textual hinting ensures that the generated image is more consistent with the user's intent.

In the process of generating the Vinsen diagram, the setting of the diffusion step is also involved. Through the process of gradual diffusion and denoising, the model can generate the details and features of the image, making the image gradually emerge from the noise. In addition, there are some fine-tuning techniques such as Dreambooth, LoRA, Embedding and Hypernetwork, which can further improve the effect of generated images. These technologies adjust the parameters and structure of the model to make the generated image more in line with specific needs, such as a certain style of painting or the characteristics of a specific person.

With the release of GPT-4, multi-modal generation has become one of its highlights. Although current diffusion models have revolutionized the field of visual creation, they only support a single cross-modal function from text to image, and are still far from general-purpose generative models. The emergence of multi-modal large models is expected to realize the transformation between various modalities, which is considered to be the future development direction of general-purpose generative models.

The TSAIL team led by Professor Zhu Jun from the Department of Computer Science, Tsinghua University proposed an innovative probabilistic modeling framework UniDiffuser , which can simultaneously model the distribution between various modes, thus achieving significant improvement in various generation tasks. With the further development of technology, multimodal generative models are expected to bring more possibilities for creative transformation among images, texts and other modalities, bringing new opportunities for multi-field applications.

 

In recent years, the text-to-image generation based on the diffusion model has made significant progress , and high-quality images can be generated only through simple natural language descriptions. This technology has been widely used in e-commerce, virtual reality, entertainment and other fields. . However, the current pre-trained large-scale graph-generated text models do not have the ability to controllably generate specific objects, characters or scenes. In the wide application of large models, personalized controllable generation is very important for many application fields. How to design an algorithm based on a small number of specific object samples, so that the graph-generated text large-scale model can generate the characteristics of specific objects personalized and maintain editability has become an important research direction.

The 2nd Guangdong-Hong Kong-Macao Greater Bay Area (Whampoa) Algorithm Calculation Contest in 2023 officially kicked off on July 15, in which [Efficient and Reliable Vincent Graph Method] was proposed by the team of Professor Zhu Jun from Tsinghua University as the competition topic. Focusing on large-scale graph-generated text models for personalized character generation, competitors are required to develop model tuning algorithms while being able to generate and maintain specific character characteristics, and at the same time strive for more flexible editing and lower training migration costs. The task is to design personalized image content generation and fine generation control under specific semantics, and promote the development of diffusion models in model personalization and controllable generation technology.

 

The competition is currently in the registration stage. The competition is open to the world. We sincerely invite college students who are innovative and have a good foundation in AI algorithm calculations, practitioners and makers from related companies and research institutes in the AI ​​​​field to sign up for the competition!

Introduction to the competition

The Guangdong-Hong Kong-Macao Greater Bay Area (Whampoa) International Algorithm Example Competition is an international competition in the field of algorithm examples established by Pazhou Laboratory (Whampoa) in 2022, entrusted by the Huangpu District Government of Guangzhou. It aims to promote the construction of the big data and artificial intelligence algorithm ecosystem in the Greater Bay Area by giving full play to the leading and leading role of the laboratory in the field of digital economy.

 

The competition actively responds to the national, Guangdong-Hong Kong-Macao Greater Bay Area, Guangzhou City, and Huangpu District's digital innovation and development strategies. It stands high and is at the forefront of the world in the development of digital economy and artificial intelligence . , artificial intelligence, Internet of Things, cloud computing and other new-generation information technologies, aiming at solving major national needs and cutting-edge technologies in the field, focusing on smart cities, smart health, smart manufacturing, smart finance and other industries, selecting high-quality algorithms for the whole country, and gathering for the world Big data and artificial intelligence high-precision technology, attracting international high-end talents in algorithms. The competition has set up a total prize pool of 10 million, with a single track bonus of up to 1 million (only registered members of the team can receive the prize), aiming to attract global outstanding talents and top teams in the field of artificial intelligence, and to cultivate and build a group of innovative artificial Smart industry cluster.

Contest questions

The competition innovatively set up a dual-track competition system— the arena-based track & the competition-based track , condensing ten challenging questions, providing contestants with multi-scenario, multi-field, and multi-industry competition content, and promoting industry-university-research Use the fusion development.

Challenge competition questions:

Problem 1: Continuous Learning of Sequential Tasks

Problem 2: Discovery of new categories of images based on language enhancement

Question 3: Efficient and reliable Vinsen diagram method

Question 4: Strengthening the Comprehensive Ability of Large Language Models

Question 5: Cross-scene Monocular Depth Estimation

Contest Questions:

Problem 1: 3D Reconstruction of Objects with Neural Implicit Representation

Question 2: Watch the video and talk

Question 3: Roadside millimeter-wave radar calibration and target tracking

Question 4: Emergency multi-organ and multi-disease screening

Question 5: Video Frame Interpolation in Fast Motion Scenes

2023 Competition Timeline

  • July 15th-September 20th: Registration for the competition begins and the preliminary round (registration is available for the preliminary round)
  • September 21-October 6: Preliminary evaluation
  • After October 7: finals and final evaluation
  • Early to mid-November: final defense and result announcement
  • December: Awards Ceremony and Prize Distribution

Entry notice

(1) Log in to the official homepage of the contest: https://iacc.pazhoulab-huangpu.com/contest/

Click the "Register Now" button corresponding to the topic in the competition topic selection, submit the registration information, and then you can participate in the competition.

(2) Confirm that the registration information and team information are accurate and valid. If a trumpet or fake name is found, the qualifications, results and bonuses will be cancelled.

(3) Participants: The competition is open to the whole society. Individuals, institutions of higher learning, research institutes, maker teams, enterprises, etc. can sign up for the competition. Each player in each track can only join one participating team, and each team can form a team with a maximum of 5 people.

(4) The 2nd Guangdong-Hong Kong-Macao Greater Bay Area (Whampoa) Algorithm Calculation Competition has a total of 10 questions, and the same contestant (same name, mobile phone number, ID number) can register for multiple tracks.

Note: For competition-based competition questions, personnel from the corresponding competition support units (involving topic writing and data contact) are prohibited from participating, and it is forbidden to entrust others to participate. All employees of the organizer (including interns) can participate in the competition, but they can only participate in the preliminary and semi-final rankings, and cannot advance to the defense and subsequent stages.

References

[1] Jianshu. World Photography Day丨World Photography Development History

[2]澎湃网. Observation丨Behind the photographer’s rejection of AI photo awards, what does AI bring to art

[3] The heart of the machine. The Tsinghua University Zhu Jun team open sourced the first Transformer-based multimodal diffusion model, with text and graphics, rewritten and all won


Source: Algorithm Competition Center

Picture: The picture comes from the Internet, invaded and deleted

Text: Zhang Shiyue and Wang Bing

Editor: Liu Kecheng Zhang Shiyue

First trial: Xu Xing, Wang Dong

Final Judge: Zhang Hai

Hejing will fully support the development of the competition and sincerely wish the contestants good results!

Guess you like

Origin blog.csdn.net/ModelWhale/article/details/132544202