使用LlamaParse增强功能
在之前的示例中,我们向文档提出了一个非常基本的问题,即预算总额。让我们改为询问文档中一个更复杂的具体事实:
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query(
"How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
)
print(response)
不幸的是,我们得到了一个无用的答案:
The budget allocated funds to a new green investments tax credit, but the exact amount was not specified in the provided context information.
这很糟糕,因为我们恰好知道确切的数字在文档中!但PDF很复杂,有表格和多列布局,LLM错过了答案。幸运的是,我们可以使用LlamaParse来帮助我们。
首先,你需要一个LlamaCloud API密钥。你可以通过注册LlamaCloud免费获得一个。然后像你的OpenAI密钥一样将其放在你的.env文件中:
LLAMA_CLOUD_API_KEY=llx-xxxxx
现在你可以在代码中使用LlamaParse了。让我们将其作为导入引入:
from llama_parse import LlamaParse
让我们进行第二次尝试来解析和查询文件(注意这使用了documents2、index2等),看看我们是否能得到更好的答案:
documents2 = LlamaParse(result_type="markdown").load_data(
"./data/2023_canadian_budget.pdf"
)
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine()
response2 = query_engine2.query(
"How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
)
print(response2)
我们得到了!
$20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.
你可以随时查看仓库以了解这段代码的样子。
正如你所见,解析质量对LLM的理解有很大影响,即使是相对简单的问题。接下来,我们来看看如何使用记忆来帮助我们回答更复杂的问题。