Hello friends, I am rumor.
It’s been a long time since I saw OpenAI’s official website [1] , but today I felt a call from somewhere, wondering when GPT4 would be released, and opened it inexplicably, and it was expected:
![d955d740b2a035bebcd8e74725b90ce2.jpeg](https://img-blog.csdnimg.cn/img_convert/d955d740b2a035bebcd8e74725b90ce2.jpeg)
Trial: https://chat.openai.com/
It extends its grasp to dialogue again! Let's take a look at the official case:
Help people debug code and conduct multiple rounds of inquiries:
![d38ad5a6b7358d950df15e4690fddd52.jpeg](https://img-blog.csdnimg.cn/img_convert/d38ad5a6b7358d950df15e4690fddd52.jpeg)
A dangerous question was identified and declined to answer, but the user gave a better response after explaining the intent:
![015d088c676da3df29e7c5176886c8bf.jpeg](https://img-blog.csdnimg.cn/img_convert/015d088c676da3df29e7c5176886c8bf.jpeg)
Execute the command, even after changing it many times without getting angry:
![f738e6fc0d3dad0d203ccc1954c21af1.jpeg](https://img-blog.csdnimg.cn/img_convert/f738e6fc0d3dad0d203ccc1954c21af1.jpeg)
There is also a case that refers to resolution too long and will not be released. OpenAI also compared InstructGPT at the end. It can be seen that InstructGPT just executes instructions coldly, while ChatGPT is more warm.
From the above examples, we can see that ChatGPT has several obvious advantages compared with the dialogue work of other factories this year:
Based on GPT3.5, there are more training data, I don't know anything else, anyway, there is something about the debug code
Strong multi-round context understanding ability, as can be seen from the examples of referring to resolution and letter writing. If you don’t have a good memory and understanding of historical news, you may start a new topic
more human. At present, most of the models answer directly, and ChatGPT obviously has a "Chat" process with users. For example, when debugging code, it first replies "It's hard to say, give me more information."
It is difficult to make a more human-like dialogue strategy, because we don't know how to be "human-like" . In order to solve this problem, Google once split a bunch of indicators .
This time, OpenAI adopted the same strategy as DeepMind Sparrow [2] . Since we don’t know which dimensions to use to measure the quality of dialogue, we can train directly based on user feedback and let the model learn by itself.
Feedback-based training, isn't that the end of the universe reinforcement learning?
![d2468da75af34b5481ce56fb8f60615e.jpeg](https://img-blog.csdnimg.cn/img_convert/d2468da75af34b5481ce56fb8f60615e.jpeg)
The production of ChatGPT is divided into the following steps:
Use supervised data to train a dialogue model based on GPT3.5. The training data is handwritten by annotators
Manually label multiple results generated by the model, and train a model to score dialogue responses
Use the scoring model as feedback to train a dialogue model based on the PPO algorithm
The above steps are actually very similar to DeepMind’s work, and they are very intuitive, but DeepMind has done some extra training on pornographic, gambling and drug conversations. It is not yet known how ChatGPT is implemented, but with their LM personality, it is possible It's all piled up with data. . .
Of course, ChatGPT still has some limitations, such as:
Say something that is unknown so there is no doubt
Asking the same question repeatedly, or changing the answer a lot with minor adjustments
Very long-winded, as can be seen from the case, this is mainly due to the deviation brought by the training data, and the students who mark it will tend to have long sentences
Not all ambiguous situations will ask questions
Some pornographic, gambling and drug issues are still unrecognizable, and the authors are planning to use API to solve them
At present, in order to allow everyone to give more feedback, OpenAI held a feedback competition between 11.30-12.30 [3] , and interested students can actively participate.
Finally, let us wait and see for GPT4, which should be released in a while!
References
[1]
OpenAI Blog: https://openai.com/blog/
[2]Building safer dialogue agents: https://www.deepmind.com/blog/building-safer-dialogue-agents
[3]Feedback Contest: https://cdn.openai.com/chatgpt/ChatGPT_Feedback_Contest_Rules.pdf
I am a punk and geek AI algorithm lady rumor
Graduated from Beihang University, NLP Algorithm Engineer, Google Developer Expert
Welcome to follow me, take you to learn and take your liver
Spin, jump, and blink together in the age of artificial intelligence
"A model that can't debug code is not a good AI"