In the era of AIGC, build your own large model based on cloud-native MLOps (Part 2)

In order to meet the needs of enterprises for updated and iterative productivity tools in the process of digital transformation, Lingqueyun recently launched the Alauda MLOps solution to help enterprises quickly implement AI technology and realize intelligent applications and services.


AIGC big model has become the engine of enterprise innovation

With the explosion of ChatGPT, more and more people consider using AI to improve the efficiency and quality of our daily work, and assist in generating the required text data through dialogue. Whether it is summarizing data into tables, writing articles according to prompts, or conducting professional knowledge questions and answers, through appropriate prompt engineering, ChatGPT can give the best answer, and even replace part of human work.

In addition, the content generated by AI is not limited to text data, but also includes tools such as AI painting (stable diffusion), music composition (Amper Music), movie generation (Runway), etc. These are the categories of AIGC (AI Generated Content), and they are also Constantly refreshing productivity across many industries.

Alauda MLOps helps enterprises quickly build their own large models


However, enterprises need an on-premises model that they own and manage to accomplish the above work, because it can ensure:


·  Security factor: during the conversation, the enterprise does not want to send the internal data of the enterprise to the AI ​​model on the Internet;

·  Function customization: hope to use your own data to enhance the ability of the model in specific scenarios (fine tuning);

·  Content review: according to the requirements of laws and regulations, carry out secondary filtering on the input and output content.


So, in such a scenario, how can an enterprise quickly build and customize such a model? The answer is to use cloud-native MLOps + expose models!


According to OpenAI's introduction, it uses Azure + MPI large-scale GPU computing clusters when training ultra-large-scale models such as ChatGPT/GPT-4. In the private cloud native environment, using the MLOps tool chain, enterprises can also have large-scale machine learning computing power that can be scaled out horizontally. When using the MLOps platform, the following improvements can be obtained:


More  suitable for the training and prediction process of large-scale pre-trained models;

·  Reduce the application threshold for large models: built-in tutorial process for using pre-trained large models, get started in one step;

·  Perfect conventional machine learning and deep learning platform;

·  Use pipeline + scheduler to uniformly arrange large-scale distributed training tasks, and support customizing various distributed training methods and frameworks, including DDP, Pipeline, ZERo, FSDP;

Process  customization: According to the actual business, select a subset of the MLOps tool chain to build a suitable business process;

·  Complete MLOps platform: Provide a smooth and complete MLOps tool chain.


Next, we will take the Alauda MLOps platform as an example to introduce how to build your own "ChatGPT" based on the chat model (lora) of the LLaMa pre-trained model, customize and start an LLM dialogue model.


In addition, using other HuggingFace pre-training models, you can also quickly build your own models, such as Vicuna, MPT and other models. Interested readers are invited to try it by themselves.


· method of obtaining·

Enterprise MLOps:

https://www.alauda.cn/open/detail/id/740.html

Open source version of MLOps:

https://github.com/alauda/kubeflow-chart

How to complete the customization and deployment of large-scale pre-chat models under cloud-native MLOps?

First, we need to start a Notebook environment and allocate the necessary GPU resources to it (according to the actual measurement, training the alpaca 7b half-precision model requires 4 pieces of K80, or a piece of 4090, and enough memory size):


 


 

Then, we need to prepare the corresponding code and model files from github and hugging face.


·  Download project: https://github.com/tloen/alpaca-lora, then drag and drop to upload to Notebook file navigation bar. You can also use the command line to execute git clone download in Notebook;

·  Download language model pre-training weights: https://huggingface.co/decapoda-research/llama-7b-hf, and drag and upload to Notebook. You can also use git lfs clone to download the model in Notebook;

·  Download the lora model pre-training weights: https://huggingface.co/tloen/alpaca-lora-7b, and drag and upload to Notebook. You can also use git lfs clone to download the model in Notebook.


There will be a long waiting time for uploading a large model here. If the network connection with huggingface is good, you can choose to download it directly from the network in Notebook.


Next, we first use the pre-trained model we just downloaded, start an AI dialogue web application to verify the effect, and mount the disk used by Notebook to read these model files:

Then we can use the above yaml configuration or native application creation form to create a prediction service. Note that the inference service only needs to use one K80 GPU to start.


The image we use here is built using the following Dockerfile:


After the inference service is started, we can visit it in the browser and try to make various conversations with this model. Since the alpaca-lora model does not support Chinese well enough, although Chinese can be input, most of the output is still in English. However, the model already exhibits better capabilities to a certain extent.


Finally, we can use our own labeled data to optimize and customize the model (finetunning). According to the instructions of the alpaca-lora project, refer to the following training data format, add finetune training data, and then start training. At this time, model training will only update a small number of parameters in the model, and the basic pre-trained language model (LLM) parameters will not be updated to preserve the powerful base capabilities of LLM.


The above is direct training in Notebook. If the training task has a gradually complex pipeline, you can customize the training python program to the following pipeline and submit it to the cluster for operation. If the task is a multi-machine multi-card + model parallel training framework, you can also configure the number of training nodes and implement the corresponding distributed computing code in the python code according to the framework, without any code modification based on the MLOps pipeline scheduling.


The above is direct training in Notebook, and only all GPU cards on one physical node can be used at most. If the training task requires distributed training across physical nodes, you can build the training Python program into the following pipeline and submit it to the cluster for operation.


Note that MLOps supports building distributed training steps directly in the task pipeline. Unlike the Kubeflow Training Operator mode, users need to define the TFJob trained on Kubernetes, the YAML configuration file of PytorchJob, and drag and drop the Python program as a workflow step. The parallelism of this node can be set separately, that is, the ParallelFor primitive of the pipeline. In this way, whether it is data parallelism (DDP), pipeline parallelism (PipelineParallel), FSDP, or other distributed training methods, as well as training done using any framework such as transformers, accelerate, can be customized within the pipeline.

In addition, the distributed training pipeline built on the MLOps platform can choose to use the Volcano scheduler to complete the scheduling of GPUs and Pods to prevent resource waste caused by multiple tasks occupying resources with each other.

In this way, after we drag and drop the Python code, we need to configure the parallelism of this task, the CPU, memory, graphics card resources required by each node, and the runtime image, and then click the "Submit to run" button on the interface to start the task, and check the running status of the task.


After completing the finetunning training, you can refer to the above steps to use the new model to start the inference service and start verification. At this point you already have a "ChatGPT" of your own! ! !


Of course, if you feel that the current 7b (model with a scale of 7 billion parameters) has limited capabilities, you can also try larger models, such as 13B, 30B, 65B, etc., or use model structures other than alpaca-lora, such as:


tiiuae/falcon-40b · Hugging Face

lmsys/vicuna-13b-delta-v1.1 · Hugging Face

https://huggingface.co/mosaicml/mpt-7b-chat

https://github.com/ymcui/Chinese-LLaMA-Alpaca

https://huggingface.co/THUDM/chatglm-6b


In addition, it is worth mentioning that we will support smoother training and prediction methods for large models in future versions (as shown in the figure below), please pay attention to our updates in time.


If you want to verify the capabilities of these public models, or create your own ChatGPT, let the cloud-native MLOps platform help you do it~

Previous: AIGC era, build your own large model based on cloud-native MLOps (Part 1)

Guess you like

Origin blog.csdn.net/alauda_andy/article/details/131374095