Table of contents
first look at the effect
First of all, find a paper first, I just found a paper in pdf format here
Well, I now let him act as a smart assistant for a research paper, of course, you can customize your own prompt
start quiz
It can be seen that the effect is very strong
Realization principle
- Extract pdf text for subsequent processing.
- Since the OpenAI API has a limit on the number of Tokens, we need to divide the PDF text into fragments smaller than the Token limit.
- Use OpenAI's Embedding API to generate vectors for each segment and save them to the database (Postgres)
- start asking questions
- Convert the question asked by the user into a vector.
- The cosine similarity algorithm is used to compare the question vector posed by the user with the vectors in the database to find the text segment most similar to the question.
- Feed snippets of text to ChatGPT and have it answer user questions based on those snippets.
Code resources, I put them on the network disk, you can mention them yourself
Link: https://pan.baidu.com/s/1Os_DR8lC9gBtc2ONNN5YJg?pwd=6666
Extraction code: 6666
-- Sharing from Baidu Netdisk super member V1
Environment installation
Python environment 3.7+, mine is 3.8
pip install -r requirements.txt
If an ssl error occurs when running
urllib3 can be downgraded
pip install urllib3==1.25.11
The execution code is this
Then, everyone needs to use special Internet access, because in essence, openai is still used
Before using it, we need to feed our corpus to openai, only need to feed it once, if we change the corpus, we need to feed it again
Feeding, you can comment out the second use
In addition, you need to change your key to your own before running
Application Scenario
You can use this method of uploading files to solve the word limit problem of openai’s token, and make our documents an assistant to help you learn. Of course, you can study other ideas that can be used to start a business by yourself.