OpenAI released ChatGPT! Hands-on debug code!

Hello friends, I am rumor.

It’s been a long time since I saw OpenAI’s official website [1] , but today I felt a call from somewhere, wondering when GPT4 would be released, and opened it inexplicably, and it was expected:

d955d740b2a035bebcd8e74725b90ce2.jpeg

Trial: https://chat.openai.com/

It extends its grasp to dialogue again! Let's take a look at the official case:

Help people debug code and conduct multiple rounds of inquiries:

d38ad5a6b7358d950df15e4690fddd52.jpeg

A dangerous question was identified and declined to answer, but the user gave a better response after explaining the intent:

015d088c676da3df29e7c5176886c8bf.jpeg

Execute the command, even after changing it many times without getting angry:

f738e6fc0d3dad0d203ccc1954c21af1.jpeg

There is also a case that refers to resolution too long and will not be released. OpenAI also compared InstructGPT at the end. It can be seen that InstructGPT just executes instructions coldly, while ChatGPT is more warm.

From the above examples, we can see that ChatGPT has several obvious advantages compared with the dialogue work of other factories this year:

  1. Based on GPT3.5, there are more training data, I don't know anything else, anyway, there is something about the debug code

  2. Strong multi-round context understanding ability, as can be seen from the examples of referring to resolution and letter writing. If you don’t have a good memory and understanding of historical news, you may start a new topic

  3. more human. At present, most of the models answer directly, and ChatGPT obviously has a "Chat" process with users. For example, when debugging code, it first replies "It's hard to say, give me more information."

It is difficult to make a more human-like dialogue strategy, because we don't know how to be "human-like" . In order to solve this problem, Google once split a bunch of indicators .

This time, OpenAI adopted the same strategy as DeepMind Sparrow [2] . Since we don’t know which dimensions to use to measure the quality of dialogue, we can train directly based on user feedback and let the model learn by itself.

Feedback-based training, isn't that the end of the universe reinforcement learning?

d2468da75af34b5481ce56fb8f60615e.jpeg

The production of ChatGPT is divided into the following steps:

  1. Use supervised data to train a dialogue model based on GPT3.5. The training data is handwritten by annotators

  2. Manually label multiple results generated by the model, and train a model to score dialogue responses

  3. Use the scoring model as feedback to train a dialogue model based on the PPO algorithm

The above steps are actually very similar to DeepMind’s work, and they are very intuitive, but DeepMind has done some extra training on pornographic, gambling and drug conversations. It is not yet known how ChatGPT is implemented, but with their LM personality, it is possible It's all piled up with data. . .

Of course, ChatGPT still has some limitations, such as:

  1. Say something that is unknown so there is no doubt

  2. Asking the same question repeatedly, or changing the answer a lot with minor adjustments

  3. Very long-winded, as can be seen from the case, this is mainly due to the deviation brought by the training data, and the students who mark it will tend to have long sentences

  4. Not all ambiguous situations will ask questions

  5. Some pornographic, gambling and drug issues are still unrecognizable, and the authors are planning to use API to solve them

At present, in order to allow everyone to give more feedback, OpenAI held a feedback competition between 11.30-12.30 [3] , and interested students can actively participate.

Finally, let us wait and see for GPT4, which should be released in a while!

References

[1]

OpenAI Blog: https://openai.com/blog/

[2]

Building safer dialogue agents: https://www.deepmind.com/blog/building-safer-dialogue-agents

[3]

Feedback Contest: https://cdn.openai.com/chatgpt/ChatGPT_Feedback_Contest_Rules.pdf

9eb9e01168eb7c2b04cc2c20ea7ff8c6.jpeg


I am a punk and geek AI algorithm lady rumor

Graduated from Beihang University, NLP Algorithm Engineer, Google Developer Expert

Welcome to follow me, take you to learn and take your liver

Spin, jump, and blink together in the age of artificial intelligence

"A model that can't debug code is not a good AI"1375eae85757ad9919be1f2cc7481f4a.png

Guess you like

Origin blog.csdn.net/m0_37310036/article/details/128156940