Openai Agent SDK 快速入门

一、引言

最近,OpenAI 正式发布了 Agent 开发三剑客 —— 内置工具集、Responses API 和开源 Agents SDK,标志着 AI 智能体开发进入标准化阶段。本文将基于官方文档和最新技术动态,系统讲解如何利用这些工具快速构建具备自主决策能力的 AI 智能体。

二、核心组件解析

  1. 三大内置工具技术架构

    • Web搜索工具
      底层采用GPT-4o的检索增强架构,支持实时网页抓取与向量数据库融合
      新增引用验证模块:通过语义分析自动校验搜索结果与查询的相关性,置信度阈值可配置
      典型应用场景:金融情报系统中,结合o1模型实现法律文件条款与市场数据的交叉验证,如某平台通过该工具发现收购案中的"控制权变更"条款,避免7500万美元债务风险
    • 文件搜索工具
      支持混合检索模式:向量检索(基于Sentence-BERT)+元数据过滤(支持SQL-like查询语法)
      集成RAG流水线:检索结果自动注入prompt,提升文档推理效率4倍以上(如BlueJ税务平台案例)
      企业级部署方案:通过分布式索引实现PB级文档秒级响应,支持动态热更新
    • Computer Use工具
      基于Operator技术的屏幕分析引擎:集成CV模型识别UI元素,支持跨平台操作录制/回放
      键鼠操作序列优化算法:自动生成最短操作路径,减少30%以上的无效动作
      典型应用:Unify ERP系统自动化,实现订单处理流程300%效率提升
  2. Responses API架构演进

    • 核心设计原则:
      多轮对话状态管理:支持嵌套工具调用链,上下文传递准确率达99.2%
      可观测性增强:通过tracking_id记录完整决策路径,支持生成可视化决策树
      成本优化:动态模型选择策略(基于任务复杂度自动切换4o/o1/3.5)
    • 协议对比:
      功能特性 Responses API Chat Completions API
      工具调用 内置支持(3大工具+自定义) 需外部集成
      多轮协作 原生支持 需开发者手动维护上下文
      响应模式 流式输出+异步回调 同步返回
      计费单元 token+工具调用 仅token
  3. Agents SDK企业级特性

    • 智能体编排引擎:
      支持BPMN 2.0标准的工作流定义,可视化编辑工具链
      动态负载均衡:根据智能体当前负载自动分配任务,吞吐量提升2.5倍
      故障熔断机制:支持重试策略、降级方案与错误隔离
    • 安全控制模块:
      输入验证:基于正则表达式的敏感词过滤+意图分类模型
      输出审查:集成RLHF价值观对齐模型,误判率<0.3%
      操作审计:全链路日志追踪,支持决策路径回溯分析
    • 多智能体协作模式:

三、快速上手指南

环境准备

pip install openai-agents

快速定义Agent

from agents import Agent, InputGuardrail,GuardrailFunctionOutput, Runner
from pydantic import BaseModel
import asyncio

class HomeworkOutput(BaseModel):
    is_homework: bool
    reasoning: str

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking about homework.",
    output_type=HomeworkOutput,
)

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)

history_tutor_agent = Agent(
    name="History Tutor",
    handoff_description="Specialist agent for historical questions",
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",
)


async def homework_guardrail(ctx, agent, input_data):
    result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
    final_output = result.final_output_as(HomeworkOutput)
    return GuardrailFunctionOutput(
        output_info=final_output,
        tripwire_triggered=not final_output.is_homework,
    )

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[history_tutor_agent, math_tutor_agent],
    input_guardrails=[
        InputGuardrail(guardrail_function=homework_guardrail),
    ],
)

async def main():
    result = await Runner.run(triage_agent, "who was the first president of the united states?")
    print(result.final_output)

    result = await Runner.run(triage_agent, "what is life")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

四、工具的使用

4.1 FileSearchTool

OpenAI Agent SDK 中的 FileSearchTool 是专为构建智能体(AI Agents)设计的检索工具,旨在帮助开发者快速从大量文档中提取关键信息。以下是其核心介绍:
核心功能

  • 文档检索能力
    支持从多种文件格式(如 PDF、Excel、Word、文本、代码文件等)中检索信息。
    结合向量搜索和关键词搜索技术,精准定位相关内容。
  • 高级特性
    元数据过滤:通过文件属性(如创建时间、作者、标签)筛选结果。
    查询优化:自动重写查询以提升准确性。
    自定义排序:根据相关性或其他指标对结果排序。
    直接搜索端点:可直接访问向量存储,减少模型预处理步骤。
  • 集成与灵活性
    无缝集成至 OpenAI 的 Responses API 和 Agents SDK,简化多工具协同。
    支持与其他工具(如 Web 搜索、计算机操作工具)组合使用。
import asyncio

from agents import Agent, FileSearchTool, Runner, trace


async def main():
    agent = Agent(
        name="File searcher",
        instructions="You are a helpful agent.",
        tools=[
            FileSearchTool(
                max_num_results=3,
                vector_store_ids=["vs_67bf88953f748191be42b462090e53e7"],
                include_search_results=True,
            )
        ],
    )

    with trace("File search example"):
        result = await Runner.run(
            agent, "Be concise, and tell me 1 sentence about Arrakis I might not know."
        )
        print(result.final_output)
        """
        Arrakis, the desert planet in Frank Herbert's "Dune," was inspired by the scarcity of water
        as a metaphor for oil and other finite resources.
        """

        print("\n".join([str(out) for out in result.new_items]))
        """
        {"id":"...", "queries":["Arrakis"], "results":[...]}
        """


if __name__ == "__main__":
    asyncio.run(main())

4.2 WebSearchTool

OpenAI Agent SDK中的WebSearchTool是一款基于ChatGPT同款搜索技术的实时网络检索工具,支持多轮对话和复杂查询,能为开发者提供带引用来源的准确信息。该工具可通过Responses API或Agents SDK无缝集成,默认支持gpt-4 ogpt-4 o-mini模型,在聊天补全API中则需使用专用模型gpt-4 o-search-previewgpt-4 o-mini-search-preview。它无需额外配置即可默认嵌入智能体,支持与文件搜索、计算机操作等工具协同工作,适用于实时问答、动态数据分析、内容生成等场景。目前处于预览阶段,检索费用按输入Token计费,未来可能调整定价。开发者可通过Python代码或REST API调用,结合自定义参数如用户地理位置和搜索上下文大小优化结果,同时需注意合规性和预览阶段的功能限制。该工具的推出显著增强了智能体处理实时信息的能力,推动了AI代理在电商、金融、研究等领域的应用。

import asyncio

from agents import Agent, Runner, WebSearchTool, trace


async def main():
    agent = Agent(
        name="Web searcher",
        instructions="You are a helpful agent.",
        tools=[WebSearchTool(user_location={
    
    "type": "approximate", "city": "New York"})],
    )

    with trace("Web search example"):
        result = await Runner.run(
            agent,
            "search the web for 'local sports news' and give me 1 interesting update in a sentence.",
        )
        print(result.final_output)
        # The New York Giants are reportedly pursuing quarterback Aaron Rodgers after his ...


if __name__ == "__main__":
    asyncio.run(main())

4.3 Response API

OpenAI API 为最先进的 AI模型提供了一个简单的接口,用于文本生成、自然语言处理、计算机视觉等

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    input="Write a one-sentence bedtime story about a unicorn."
)

print(response.output_text)

4.4 Computer Use Tool

导入模块

import asyncio
import base64
from typing import Literal, Union

from playwright.async_api import Browser, Page, Playwright, async_playwright

from agents import (
    Agent,
    AsyncComputer,
    Button,
    ComputerTool,
    Environment,
    ModelSettings,
    Runner,
    trace,
)
  • asyncio:用于实现异步编程。
  • base64:用于对图片数据进行 Base64 编码。
  • typing 模块中的 LiteralUnion:用于类型注解。
  • playwright.async_api:提供了异步的 Playwright API,用于自动化浏览器操作。
  • agents 模块:包含自定义的代理、工具等类。

主函数 main

async def main():
    async with LocalPlaywrightComputer() as computer:
        with trace("Computer use example"):
            agent = Agent(
                name="Browser user",
                instructions="You are a helpful agent.",
                tools=[ComputerTool(computer)],
                # Use the computer using model, and set truncation to auto because its required
                model="computer-use-preview",
                model_settings=ModelSettings(truncation="auto"),
            )
            result = await Runner.run(agent, "Search for SF sports news and summarize.")
            print(result.final_output)
  • 运用 async with 语句创建 LocalPlaywrightComputer 实例。
  • 创建一个 Agent 实例,该实例具备名称、指令、工具和模型设置。
  • 借助 Runner.run 方法让代理执行搜索旧金山体育新闻并总结的任务。
  • 打印最终结果。

键映射字典 CUA_KEY_TO_PLAYWRIGHT_KEY

CUA_KEY_TO_PLAYWRIGHT_KEY = {
    
    
    "/": "Divide",
    "\\": "Backslash",
    "alt": "Alt",
    # 其他键映射...
}

此字典把自定义的键名映射到 Playwright 所支持的键名。

LocalPlaywrightComputer

class LocalPlaywrightComputer(AsyncComputer):
    """A computer, implemented using a local Playwright browser."""

    def __init__(self):
        self._playwright: Union[Playwright, None] = None
        self._browser: Union[Browser, None] = None
        self._page: Union[Page, None] = None
  • 继承自 AsyncComputer 类,利用本地 Playwright 浏览器来实现计算机功能。
  • __init__ 方法对 _playwright_browser_page 属性进行初始化。

_get_browser_and_page 方法

async def _get_browser_and_page(self) -> tuple[Browser, Page]:
    width, height = self.dimensions
    launch_args = [f"--window-size={
      
      width},{
      
      height}"]
    browser = await self.playwright.chromium.launch(headless=False, args=launch_args)
    page = await browser.new_page()
    await page.set_viewport_size({
    
    "width": width, "height": height})
    await page.goto("https://www.bing.com")
    return browser, page
  • 启动 Chromium 浏览器,创建新页面。
  • 设置页面视口大小。
  • 导航到 Bing 搜索页面。

__aenter____aexit__ 方法

async def __aenter__(self):
    self._playwright = await async_playwright().start()
    self._browser, self._page = await self._get_browser_and_page()
    return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
    if self._browser:
        await self._browser.close()
    if self._playwright:
        await self._playwright.stop()
  • __aenter__ 方法:启动 Playwright 并获取浏览器和页面。
  • __aexit__ 方法:关闭浏览器并停止 Playwright。

属性方法

@property
def playwright(self) -> Playwright:
    assert self._playwright is not None
    return self._playwright

@property
def browser(self) -> Browser:
    assert self._browser is not None
    return self._browser

@property
def page(self) -> Page:
    assert self._page is not None
    return self._page

@property
def environment(self) -> Environment:
    return "browser"

@property
def dimensions(self) -> tuple[int, int]:
    return (1024, 768)
  • 这些属性方法用于获取 Playwright、浏览器、页面、环境和页面尺寸。

操作方法

async def screenshot(self) -> str:
    png_bytes = await self.page.screenshot(full_page=False)
    return base64.b64encode(png_bytes).decode("utf-8")

async def click(self, x: int, y: int, button: Button = "left") -> None:
    playwright_button: Literal["left", "middle", "right"] = "left"
    if button in ("left", "right", "middle"):
        playwright_button = button  # type: ignore
    await self.page.mouse.click(x, y, button=playwright_button)

# 其他操作方法...
  • 这些方法实现了截图、点击、双击、滚动、输入、等待、移动、按键和拖动等操作。

完整代码:

import asyncio
import base64
from typing import Literal, Union

from playwright.async_api import Browser, Page, Playwright, async_playwright

from agents import (
    Agent,
    AsyncComputer,
    Button,
    ComputerTool,
    Environment,
    ModelSettings,
    Runner,
    trace,
)

# Uncomment to see very verbose logs
# import logging
# logging.getLogger("openai.agents").setLevel(logging.DEBUG)
# logging.getLogger("openai.agents").addHandler(logging.StreamHandler())


async def main():
    async with LocalPlaywrightComputer() as computer:
        with trace("Computer use example"):
            agent = Agent(
                name="Browser user",
                instructions="You are a helpful agent.",
                tools=[ComputerTool(computer)],
                # Use the computer using model, and set truncation to auto because its required
                model="computer-use-preview",
                model_settings=ModelSettings(truncation="auto"),
            )
            result = await Runner.run(agent, "Search for SF sports news and summarize.")
            print(result.final_output)


CUA_KEY_TO_PLAYWRIGHT_KEY = {
    
    
    "/": "Divide",
    "\\": "Backslash",
    "alt": "Alt",
    "arrowdown": "ArrowDown",
    "arrowleft": "ArrowLeft",
    "arrowright": "ArrowRight",
    "arrowup": "ArrowUp",
    "backspace": "Backspace",
    "capslock": "CapsLock",
    "cmd": "Meta",
    "ctrl": "Control",
    "delete": "Delete",
    "end": "End",
    "enter": "Enter",
    "esc": "Escape",
    "home": "Home",
    "insert": "Insert",
    "option": "Alt",
    "pagedown": "PageDown",
    "pageup": "PageUp",
    "shift": "Shift",
    "space": " ",
    "super": "Meta",
    "tab": "Tab",
    "win": "Meta",
}


class LocalPlaywrightComputer(AsyncComputer):
    """A computer, implemented using a local Playwright browser."""

    def __init__(self):
        self._playwright: Union[Playwright, None] = None
        self._browser: Union[Browser, None] = None
        self._page: Union[Page, None] = None

    async def _get_browser_and_page(self) -> tuple[Browser, Page]:
        width, height = self.dimensions
        launch_args = [f"--window-size={
      
      width},{
      
      height}"]
        browser = await self.playwright.chromium.launch(headless=False, args=launch_args)
        page = await browser.new_page()
        await page.set_viewport_size({
    
    "width": width, "height": height})
        await page.goto("https://www.bing.com")
        return browser, page

    async def __aenter__(self):
        # Start Playwright and call the subclass hook for getting browser/page
        self._playwright = await async_playwright().start()
        self._browser, self._page = await self._get_browser_and_page()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self._browser:
            await self._browser.close()
        if self._playwright:
            await self._playwright.stop()

    @property
    def playwright(self) -> Playwright:
        assert self._playwright is not None
        return self._playwright

    @property
    def browser(self) -> Browser:
        assert self._browser is not None
        return self._browser

    @property
    def page(self) -> Page:
        assert self._page is not None
        return self._page

    @property
    def environment(self) -> Environment:
        return "browser"

    @property
    def dimensions(self) -> tuple[int, int]:
        return (1024, 768)

    async def screenshot(self) -> str:
        """Capture only the viewport (not full_page)."""
        png_bytes = await self.page.screenshot(full_page=False)
        return base64.b64encode(png_bytes).decode("utf-8")

    async def click(self, x: int, y: int, button: Button = "left") -> None:
        playwright_button: Literal["left", "middle", "right"] = "left"

        # Playwright only supports left, middle, right buttons
        if button in ("left", "right", "middle"):
            playwright_button = button  # type: ignore

        await self.page.mouse.click(x, y, button=playwright_button)

    async def double_click(self, x: int, y: int) -> None:
        await self.page.mouse.dblclick(x, y)

    async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None:
        await self.page.mouse.move(x, y)
        await self.page.evaluate(f"window.scrollBy({
      
      scroll_x}, {
      
      scroll_y})")

    async def type(self, text: str) -> None:
        await self.page.keyboard.type(text)

    async def wait(self) -> None:
        await asyncio.sleep(1)

    async def move(self, x: int, y: int) -> None:
        await self.page.mouse.move(x, y)

    async def keypress(self, keys: list[str]) -> None:
        for key in keys:
            mapped_key = CUA_KEY_TO_PLAYWRIGHT_KEY.get(key.lower(), key)
            await self.page.keyboard.press(mapped_key)

    async def drag(self, path: list[tuple[int, int]]) -> None:
        if not path:
            return
        await self.page.mouse.move(path[0][0], path[0][1])
        await self.page.mouse.down()
        for px, py in path[1:]:
            await self.page.mouse.move(px, py)
        await self.page.mouse.up()


if __name__ == "__main__":
    asyncio.run(main())

此代码实现了一个自动化浏览器操作的异步程序,借助 Playwright 库实现了浏览器的各种操作,让代理能够在浏览器中完成搜索和总结任务。