The developers laughed like crazy! The shocking leak of LLaMa detonated the ChatGPT replacement frenzy, and the field of open source LLM has changed

Source: Xinzhiyuan Wechat ID: AI-era

Meta's LLaMA model is open source, which ushers in the stable diffusion moment for large text models. no one thought

Who would have thought that an accidental LLaMA leak would ignite the biggest spark of innovation in the field of open source LLM.

A series of outstanding ChatGPT open source alternatives - "Alpaca Family", followed by a dazzling debut.

The friction between open source and API-based distribution is one of the most looming tensions in the generative AI ecosystem.

In the text-to-image domain, the release of Stable Diffusion clearly demonstrates that open source is a viable distribution mechanism for underlying models.

However, this is not the case in the field of large language models, where the biggest breakthroughs, such as models such as GPT-4, Claude, and Cohere, are only available through APIs.

Open-source alternatives to these models have not demonstrated the same level of performance, especially in the ability to follow human instructions. However, an unexpected leak completely changed the situation.

LLaMA's "epic" leak

A few weeks ago, Meta AI introduced the Large Language Model LLaMA.

LLaMA has different versions, including parameters of 7B, 13B, 33B, and 65B, and although it is smaller than GPT-3, it can match the performance of GPT-3 on many tasks.

LLaMA was originally not open source, but a week after its release, the model was suddenly leaked on 4chan, sparking thousands of downloads.

This event can be called an "epic leak" because it has become an endless source of innovation in the field of large language models.

In just a few weeks, innovation in LLM agencies built on top of it has exploded.

Alpaca, Vicuna, Koala, ChatLLaMA, FreedomGPT, ColossalChat... Let us review how this "alpaca family" explosion was born.

Alpaca

In mid-March, the large model Alpaca released by Stanford became popular. 

Alpaca is a brand new model fine-tuned by Meta's LLaMA 7B. It only uses 52k data and its performance is about equal to GPT-3.5.

The key is that the training cost is surprisingly low, less than $600.

Stanford researchers compared GPT-3.5 (text-davinci-003) with Alpaca 7B and found that the performance of the two models is very similar. Alpaca wins 90 vs. 89 times against GPT-3.5.

For the Stanford team, if they want to train a high-quality instruction-following model within a budget, they must face two important challenges: a powerful pre-trained language model and a high-quality instruction-following data.

As it happens, the LLaMA model provided to academic researchers solves the first problem.

For the second challenge, the paper "Self-Instruct: Aligning Language Model with Self Generated Instructions" gave a good inspiration, which is to use the existing strong language model to automatically generate instruction data.

The biggest weakness of the LLaMA model is the lack of instruction fine-tuning. One of OpenAI's biggest innovations is the use of instruction tuning on GPT-3.

In response, Stanford used an existing large language model to automatically generate instructions-following demonstrations.

Now, Alpaca is directly regarded by netizens as "Stable Diffusion of large text models".

Vicuna

In late March, researchers from UC Berkeley, Carnegie Mellon, Stanford, and UC San Diego open-sourced Vicuna, a fine-tuned version of LLaMA that matches the performance of GPT-4.

Vicuna, with 13 billion parameters, is obtained by fine-tuning training on LLaMA on user-shared conversations collected by ShareGPT, and the training cost is nearly $300.

The results show that Vicuna-13B achieves the ability to match ChatGPT and Bard in more than 90% of the cases.

For the Vicuna-13B training process, the details are as follows:

First, the researchers collected about 70K conversations from the ChatGPT conversation sharing website ShareGPT.

Next, the researchers optimized the training script provided by Alpaca so that the model could better handle multiple rounds of dialogue and long sequences. A day of training was then performed on 8 A100 GPUs using PyTorch FSDP.

For quality assessment of the model, the researchers created 80 different questions and rated the model output with GPT-4.

To compare the different models, the researchers combined the output of each model into a single cue and then had GPT-4 evaluate which model gave the better answer.

Comparison of LLaMA, Alpaca, Vicuna and ChatGPT

Koala

Recently, UC Berkeley AI Research Institute (BAIR) released a new model "Koala". Compared with the previous fine-tuning of instructions using OpenAI's GPT data, Koala's difference is that it uses high-quality data obtained from the network for train.

The findings show that Koala can effectively answer a variety of user queries, generating answers that tend to be more popular than Alpaca and comparable to ChatGPT in at least half of the cases.

The researchers hope that the results of this experiment will further the discussion around the relative performance of large closed-source models versus small public models, especially as the results show that small models that can be run locally can also be improved if the training data is carefully collected. The performance of large models can be achieved.

In fact, the Alpaca model released by Stanford University before this, the experimental results of fine-tuning the LLaMA data according to OpenAI's GPT model have shown that the correct data can significantly improve the smaller open source model.

This is also the original intention of the Berkeley researchers to develop and release the Koala model, hoping to provide another experimental proof for the discussion results.

Koala fine-tunes on freely available interaction data obtained from the web, and pays special attention to data including interactions with high-performance closed-source models such as ChatGPT.

Instead of pursuing as much scraped network data as possible to maximize the amount of data, the researchers focused on collecting a small high-quality dataset, including ChatGPT distillation data, open source data, etc.

ChatLLaMA

Nebuly open-sourced ChatLLaMA, a framework that lets us create conversational assistants using our own data.

ChatLLaMA lets us create hyper-personalized ChatGPT-like assistants using our own data and as little computation as possible.

Assuming that in the future, we no longer rely on a large assistant that "rules everyone", everyone can create their own personalized version of ChatGPT-like assistants that can support various human needs.

However, creating such a personalized assistant requires efforts in many areas: dataset creation, efficient training using RLHF, and inference optimization.

The purpose of this library is to give developers peace of mind by abstracting the work required to optimize and collect large amounts of data.

  

ChatLLaMA is designed to help developers tackle various use cases, all related to RLHF training and optimizing inference. Here are some use case references:

  • Create ChatGPT-like personalized assistants for vertical-specific tasks (legal, medical, gaming, academic research, etc.);

  • Want to use limited data on local hardware infrastructure to train an efficient ChatGPT-like assistant;

  • Want to create your own personalized version of ChatGPT-like assistants while avoiding out-of-control costs;

  • Want to know which model architecture (LLaMA, OPT, GPTJ, etc.) best meets my requirements in terms of hardware, computing budget and performance;

  • Wanting to align the Assistant with my personal/company values, culture, brand and manifesto.

  

FreedomGPT

Built with Electron and React, FreedomGPT is a desktop application that allows users to run LLaMA on their local machines.

The characteristics of FreedomGPT can be seen from its name - the questions it answers are not subject to any review or security filtering.

The program was developed by Age of AI, an AI venture capital firm.

FreedomGPT is built on top of Alpaca. FreedomGPT uses the salient features of Alpaca because it is relatively more accessible and customizable than other models.

ChatGPT follows OpenAI's usage policy, limiting hate, self-harm, threats, violence, and sexual content.

Unlike ChatGPT, FreedomGPT answers questions without bias or favoritism, and does not hesitate to answer controversial or controversial topics.

FreedomGPT even answered "how to make a bomb at home", while OpenAI specifically removed this from GPT-4.

FreedomGPT is unique because it overcomes censorship restrictions and caters to controversial topics without any guarantees. Its symbol is the Statue of Liberty, because this unique and bold big language model symbolizes freedom.

FreedomGPT can even run locally on a computer without the need for an internet connection.

Additionally, an open-source version will be released soon, enabling full customization by users and organizations.

ColossalChat

The ColossalChat proposed by UC Berkeley only needs less than 10 billion parameters to achieve Chinese-English bilingual ability, and the effect is comparable to ChatGPT and GPT-3.5.

In addition, ColossalChat, based on the LLaMA model, also reproduces the complete RLHF process, which is currently the closest open source project to the original technical route of ChatGPT.

Chinese-English bilingual training data set

ColossalChat released a bilingual dataset containing about 100,000 Chinese-English question-answer pairs.

The dataset is collected and cleaned from real problem scenarios on social media platforms as a seed dataset, augmented with self-instruct, and the labeling cost is ~$900.

Compared with datasets generated by other self-instruct methods, this dataset contains more realistic and diverse seed data covering a wider range of topics.

This dataset is suitable for fine-tuning and RLHF training. In the case of providing high-quality data, ColossalChat can achieve better dialogue interaction, and it also supports Chinese.

 Complete RLHF pipeline

There are three stages in RLHF's algorithm re-engraving:

In RLHF-Stage1, the above bilingual dataset is used for supervised instruction fine-tuning to fine-tune the model.

In RLHF-Stage2, the reward model is trained by manually ranking different outputs of the same cue to assign corresponding scores, and then the training of the reward model is supervised.

In RLHF-Stage3, a reinforcement learning algorithm is used, which is the most complicated part of the training process.

I believe that more projects will be released soon.

No one expected that the accidental leak of LLaMA ignited the biggest spark of innovation in the field of open source LLM.

References:

https://thesequence.substack.com/p/the-LLaMA%20%20-effect-how-an-accidental

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/130160602