简单30行代码，使用LangChain 搭建专属 GPT知识库

最近，ChatGPT相关的LangChain项目备受瞩目，成了非常火热的开源代码库，并且还在快速发展中！

我们都知道，ChatGPT的训练知识库数据集是过时的，且无法联网，因此，给出的答案或数据常常是不正确的。‍‍‍

想像一下，如果我们将本地的知识文档作为prompt，使用ChatGPT根据这些资料中来回答问题，那岂不是很酷， LangChain的出现就能很好地帮我们实现这个需求。

LangChain

LangChain是一个强大的程序开发框架，专注于协助开发人员构建端到端的应用。该框架提供了一系列工具、组件和接口，方便开发人员快速构建依赖于大型语言模型（LLM）和聊天模型的应用程序。通过LangChain，开发人员可轻松管理语言模型的交互，实现多个组件的无缝链接，还能整合额外的资源（如API和数据库）来优化开发流程。

接下来，我们会用个简单的示例实现下如何使用LangChain结合ChatGPT的能力，来打造个私有的AI知识库

Deployment

整个部署过程如下：

第一步，获取代码并安装Python相关库

git clone [email protected]:christhai/langchain-chatbot.git
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirement.txt

其中app.py完整代码如下（不到30行哦）

from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex,GPTSimpleVectorIndex, PromptHelper, LLMPredictor, ServiceContext

from langchain import OpenAI

import gradio as gr

import sys

import os

os.chdir(r'/home/ubuntu/langchain')  *#* 文件路径

os.environ["OPENAI_API_KEY"] = 'sk-xxxxxxxxxxxxxxx'

def construct_index(directory_path):

    max_input_size = 4096

    num_outputs = 2000

    max_chunk_overlap = 20

    chunk_size_limit = 600

    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.7, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

    index = GPTSimpleVectorIndex.from_documents(documents,service_context=service_context)

    index.save_to_disk('index.json')

    return index

def chatbot(input_text):

    index = GPTSimpleVectorIndex.load_from_disk('index.json')

    response = index.query(input_text, response_mode="compact")

    return response.response

iface = gr.Interface(fn=chatbot,

                     inputs=gr.inputs.Textbox(lines=7, label="请输入，您想从知识库中获取什么？"),

                     outputs="text",

                     title="AI 本地知识库ChatBot")

index = construct_index("docs")

iface.launch(share=True)

第二步，更改代码中第6、7行的项目路径以及OPENAI API TOKEN

os.chdir(r'你的项目路径文件夹')  
os.environ["OPENAI_API_KEY"] = '你的OpenAI API Token'

第三步，把需要prompt的文档资料放到docs目录下（例如，我将之前复习CKA考试的笔记以及官方考试要点作为示例）

ubuntu@instance-k8s:~/langchain/docs$ ls
cka.txt  exam.txt

最后，运行程序即可（是不是很简单！！）

python3 app.py

浏览器访问本地 http://127.0.0.1:7860 或服务器URL即可

Demo

接下来，我们演示下具体效果如何：

例如，我根据CKA考试Tips文档中的内容提问“What You Need For Your Exam”

Langchain会根据文档里的内容整理后，返回给我如下的答案

In order to take the CKA or CKAD exam, you will need a computer with Chrome or Chromium browser, reliable internet access, a webcam, and a microphone. You will also need to have a current, non-expired government ID that has your photo and full name in the Latin alphabet. Additionally, you should run the compatibility check tool provided by the Exam Proctoring Partner to verify that your hardware meets the minimum requirements.

或者，我想找找之前CKA笔记中关于"how to create K8S deplyment with yaml"，它也返回了笔记中相关的答案。

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

name: rss

spec:

replicas: 2

template:

metadata:

labels:

app: rss

spec:

containers:

- name: front-end

image: nginx

ports:

- containerPort: 80

- name: rss-reader

image: nickchase/rss-php-nginx:v1

ports:

- containerPort: 88

感兴趣的小伙伴可以自己部署下试试吧！