164 使用LlamaParse增强功能

使用LlamaParse增强功能

在之前的示例中,我们向文档提出了一个非常基本的问题,即预算总额。让我们改为询问文档中一个更复杂的具体事实:

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query(
    "How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
)
print(response)

不幸的是,我们得到了一个无用的答案:

The budget allocated funds to a new green investments tax credit, but the exact amount was not specified in the provided context information.

这很糟糕,因为我们恰好知道确切的数字在文档中!但PDF很复杂,有表格和多列布局,LLM错过了答案。幸运的是,我们可以使用LlamaParse来帮助我们。

首先,你需要一个LlamaCloud API密钥。你可以通过注册LlamaCloud免费获得一个。然后像你的OpenAI密钥一样将其放在你的.env文件中:

LLAMA_CLOUD_API_KEY=llx-xxxxx

现在你可以在代码中使用LlamaParse了。让我们将其作为导入引入:

from llama_parse import LlamaParse

让我们进行第二次尝试来解析和查询文件(注意这使用了documents2、index2等),看看我们是否能得到更好的答案:

documents2 = LlamaParse(result_type="markdown").load_data(
    "./data/2023_canadian_budget.pdf"
)
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine()

response2 = query_engine2.query(
    "How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
)
print(response2)

我们得到了!

$20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.

你可以随时查看仓库以了解这段代码的样子。

正如你所见,解析质量对LLM的理解有很大影响,即使是相对简单的问题。接下来,我们来看看如何使用记忆来帮助我们回答更复杂的问题。

猜你喜欢

转载自blog.csdn.net/xycxycooo/article/details/143569659