Openai Agent SDK 快速入门

一、引言

最近，OpenAI 正式发布了 Agent 开发三剑客 —— 内置工具集、Responses API 和开源 Agents SDK，标志着 AI 智能体开发进入标准化阶段。本文将基于官方文档和最新技术动态，系统讲解如何利用这些工具快速构建具备自主决策能力的 AI 智能体。

二、核心组件解析

三大内置工具技术架构
- Web搜索工具：
  底层采用GPT-4o的检索增强架构，支持实时网页抓取与向量数据库融合
  新增引用验证模块：通过语义分析自动校验搜索结果与查询的相关性，置信度阈值可配置
  典型应用场景：金融情报系统中，结合o1模型实现法律文件条款与市场数据的交叉验证，如某平台通过该工具发现收购案中的"控制权变更"条款，避免7500万美元债务风险
- 文件搜索工具：
  支持混合检索模式：向量检索（基于Sentence-BERT）+元数据过滤（支持SQL-like查询语法）
  集成RAG流水线：检索结果自动注入prompt，提升文档推理效率4倍以上（如BlueJ税务平台案例）
  企业级部署方案：通过分布式索引实现PB级文档秒级响应，支持动态热更新
- Computer Use工具：
  基于Operator技术的屏幕分析引擎：集成CV模型识别UI元素，支持跨平台操作录制/回放
  键鼠操作序列优化算法：自动生成最短操作路径，减少30%以上的无效动作
  典型应用：Unify ERP系统自动化，实现订单处理流程300%效率提升

Responses API架构演进

核心设计原则：
多轮对话状态管理：支持嵌套工具调用链，上下文传递准确率达99.2%
可观测性增强：通过tracking_id记录完整决策路径，支持生成可视化决策树
成本优化：动态模型选择策略（基于任务复杂度自动切换4o/o1/3.5）

协议对比：

功能特性	Responses API	Chat Completions API
工具调用	内置支持（3大工具+自定义）	需外部集成
多轮协作	原生支持	需开发者手动维护上下文
响应模式	流式输出+异步回调	同步返回
计费单元	token+工具调用	仅token

Agents SDK企业级特性
- 智能体编排引擎：
  支持BPMN 2.0标准的工作流定义，可视化编辑工具链
  动态负载均衡：根据智能体当前负载自动分配任务，吞吐量提升2.5倍
  故障熔断机制：支持重试策略、降级方案与错误隔离
- 安全控制模块：
  输入验证：基于正则表达式的敏感词过滤+意图分类模型
  输出审查：集成RLHF价值观对齐模型，误判率<0.3%
  操作审计：全链路日志追踪，支持决策路径回溯分析
- 多智能体协作模式：

三、快速上手指南

环境准备

pip install openai-agents

快速定义Agent

from agents import Agent, InputGuardrail,GuardrailFunctionOutput, Runner
from pydantic import BaseModel
import asyncio

class HomeworkOutput(BaseModel):
    is_homework: bool
    reasoning: str

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking about homework.",
    output_type=HomeworkOutput,
)

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)

history_tutor_agent = Agent(
    name="History Tutor",
    handoff_description="Specialist agent for historical questions",
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",
)


async def homework_guardrail(ctx, agent, input_data):
    result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
    final_output = result.final_output_as(HomeworkOutput)
    return GuardrailFunctionOutput(
        output_info=final_output,
        tripwire_triggered=not final_output.is_homework,
    )

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[history_tutor_agent, math_tutor_agent],
    input_guardrails=[
        InputGuardrail(guardrail_function=homework_guardrail),
    ],
)

async def main():
    result = await Runner.run(triage_agent, "who was the first president of the united states?")
    print(result.final_output)

    result = await Runner.run(triage_agent, "what is life")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

四、工具的使用

4.1 FileSearchTool

OpenAI Agent SDK 中的 FileSearchTool 是专为构建智能体（AI Agents）设计的检索工具，旨在帮助开发者快速从大量文档中提取关键信息。以下是其核心介绍：
核心功能

文档检索能力
支持从多种文件格式（如 PDF、Excel、Word、文本、代码文件等）中检索信息。
结合向量搜索和关键词搜索技术，精准定位相关内容。
高级特性
元数据过滤：通过文件属性（如创建时间、作者、标签）筛选结果。
查询优化：自动重写查询以提升准确性。
自定义排序：根据相关性或其他指标对结果排序。
直接搜索端点：可直接访问向量存储，减少模型预处理步骤。
集成与灵活性
无缝集成至 OpenAI 的 Responses API 和 Agents SDK，简化多工具协同。
支持与其他工具（如 Web 搜索、计算机操作工具）组合使用。

import asyncio

from agents import Agent, FileSearchTool, Runner, trace


async def main():
    agent = Agent(
        name="File searcher",
        instructions="You are a helpful agent.",
        tools=[
            FileSearchTool(
                max_num_results=3,
                vector_store_ids=["vs_67bf88953f748191be42b462090e53e7"],
                include_search_results=True,
            )
        ],
    )

    with trace("File search example"):
        result = await Runner.run(
            agent, "Be concise, and tell me 1 sentence about Arrakis I might not know."
        )
        print(result.final_output)
        """
        Arrakis, the desert planet in Frank Herbert's "Dune," was inspired by the scarcity of water
        as a metaphor for oil and other finite resources.
        """

        print("\n".join([str(out) for out in result.new_items]))
        """
        {"id":"...", "queries":["Arrakis"], "results":[...]}
        """


if __name__ == "__main__":
    asyncio.run(main())

4.2 WebSearchTool

OpenAI Agent SDK中的WebSearchTool是一款基于ChatGPT同款搜索技术的实时网络检索工具，支持多轮对话和复杂查询，能为开发者提供带引用来源的准确信息。该工具可通过Responses API或Agents SDK无缝集成，默认支持gpt-4 o和gpt-4 o-mini模型，在聊天补全API中则需使用专用模型gpt-4 o-search-preview和gpt-4 o-mini-search-preview。它无需额外配置即可默认嵌入智能体，支持与文件搜索、计算机操作等工具协同工作，适用于实时问答、动态数据分析、内容生成等场景。目前处于预览阶段，检索费用按输入Token计费，未来可能调整定价。开发者可通过Python代码或REST API调用，结合自定义参数如用户地理位置和搜索上下文大小优化结果，同时需注意合规性和预览阶段的功能限制。该工具的推出显著增强了智能体处理实时信息的能力，推动了AI代理在电商、金融、研究等领域的应用。

import asyncio

from agents import Agent, Runner, WebSearchTool, trace


async def main():
    agent = Agent(
        name="Web searcher",
        instructions="You are a helpful agent.",
        tools=[WebSearchTool(user_location={
    
    "type": "approximate", "city": "New York"})],
    )

    with trace("Web search example"):
        result = await Runner.run(
            agent,
            "search the web for 'local sports news' and give me 1 interesting update in a sentence.",
        )
        print(result.final_output)
        # The New York Giants are reportedly pursuing quarterback Aaron Rodgers after his ...


if __name__ == "__main__":
    asyncio.run(main())

4.3 Response API

OpenAI API 为最先进的 AI模型提供了一个简单的接口，用于文本生成、自然语言处理、计算机视觉等

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    input="Write a one-sentence bedtime story about a unicorn."
)

print(response.output_text)

4.4 Computer Use Tool

导入模块

import asyncio
import base64
from typing import Literal, Union

from playwright.async_api import Browser, Page, Playwright, async_playwright

from agents import (
    Agent,
    AsyncComputer,
    Button,
    ComputerTool,
    Environment,
    ModelSettings,
    Runner,
    trace,
)

asyncio：用于实现异步编程。
base64：用于对图片数据进行 Base64 编码。
typing 模块中的 Literal 和 Union：用于类型注解。
playwright.async_api：提供了异步的 Playwright API，用于自动化浏览器操作。
agents 模块：包含自定义的代理、工具等类。

主函数 main

async def main():
    async with LocalPlaywrightComputer() as computer:
        with trace("Computer use example"):
            agent = Agent(
                name="Browser user",
                instructions="You are a helpful agent.",
                tools=[ComputerTool(computer)],
                # Use the computer using model, and set truncation to auto because its required
                model="computer-use-preview",
                model_settings=ModelSettings(truncation="auto"),
            )
            result = await Runner.run(agent, "Search for SF sports news and summarize.")
            print(result.final_output)

运用 async with 语句创建 LocalPlaywrightComputer 实例。
创建一个 Agent 实例，该实例具备名称、指令、工具和模型设置。
借助 Runner.run 方法让代理执行搜索旧金山体育新闻并总结的任务。
打印最终结果。

键映射字典 CUA_KEY_TO_PLAYWRIGHT_KEY

CUA_KEY_TO_PLAYWRIGHT_KEY = {
    
    
    "/": "Divide",
    "\\": "Backslash",
    "alt": "Alt",
    # 其他键映射...
}

此字典把自定义的键名映射到 Playwright 所支持的键名。

LocalPlaywrightComputer 类

class LocalPlaywrightComputer(AsyncComputer):
    """A computer, implemented using a local Playwright browser."""

    def __init__(self):
        self._playwright: Union[Playwright, None] = None
        self._browser: Union[Browser, None] = None
        self._page: Union[Page, None] = None

继承自 AsyncComputer 类，利用本地 Playwright 浏览器来实现计算机功能。
__init__ 方法对 _playwright、_browser 和 _page 属性进行初始化。

_get_browser_and_page 方法

async def _get_browser_and_page(self) -> tuple[Browser, Page]:
    width, height = self.dimensions
    launch_args = [f"--window-size={
      
      width},{
      
      height}"]
    browser = await self.playwright.chromium.launch(headless=False, args=launch_args)
    page = await browser.new_page()
    await page.set_viewport_size({
    
    "width": width, "height": height})
    await page.goto("https://www.bing.com")
    return browser, page

启动 Chromium 浏览器，创建新页面。
设置页面视口大小。
导航到 Bing 搜索页面。

__aenter__ 和 __aexit__ 方法

async def __aenter__(self):
    self._playwright = await async_playwright().start()
    self._browser, self._page = await self._get_browser_and_page()
    return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
    if self._browser:
        await self._browser.close()
    if self._playwright:
        await self._playwright.stop()

__aenter__ 方法：启动 Playwright 并获取浏览器和页面。
__aexit__ 方法：关闭浏览器并停止 Playwright。

属性方法

@property
def playwright(self) -> Playwright:
    assert self._playwright is not None
    return self._playwright

@property
def browser(self) -> Browser:
    assert self._browser is not None
    return self._browser

@property
def page(self) -> Page:
    assert self._page is not None
    return self._page

@property
def environment(self) -> Environment:
    return "browser"

@property
def dimensions(self) -> tuple[int, int]:
    return (1024, 768)

这些属性方法用于获取 Playwright、浏览器、页面、环境和页面尺寸。

操作方法

async def screenshot(self) -> str:
    png_bytes = await self.page.screenshot(full_page=False)
    return base64.b64encode(png_bytes).decode("utf-8")

async def click(self, x: int, y: int, button: Button = "left") -> None:
    playwright_button: Literal["left", "middle", "right"] = "left"
    if button in ("left", "right", "middle"):
        playwright_button = button  # type: ignore
    await self.page.mouse.click(x, y, button=playwright_button)

# 其他操作方法...

这些方法实现了截图、点击、双击、滚动、输入、等待、移动、按键和拖动等操作。

完整代码：

import asyncio
import base64
from typing import Literal, Union

from playwright.async_api import Browser, Page, Playwright, async_playwright

from agents import (
    Agent,
    AsyncComputer,
    Button,
    ComputerTool,
    Environment,
    ModelSettings,
    Runner,
    trace,
)

# Uncomment to see very verbose logs
# import logging
# logging.getLogger("openai.agents").setLevel(logging.DEBUG)
# logging.getLogger("openai.agents").addHandler(logging.StreamHandler())


async def main():
    async with LocalPlaywrightComputer() as computer:
        with trace("Computer use example"):
            agent = Agent(
                name="Browser user",
                instructions="You are a helpful agent.",
                tools=[ComputerTool(computer)],
                # Use the computer using model, and set truncation to auto because its required
                model="computer-use-preview",
                model_settings=ModelSettings(truncation="auto"),
            )
            result = await Runner.run(agent, "Search for SF sports news and summarize.")
            print(result.final_output)


CUA_KEY_TO_PLAYWRIGHT_KEY = {
    
    
    "/": "Divide",
    "\\": "Backslash",
    "alt": "Alt",
    "arrowdown": "ArrowDown",
    "arrowleft": "ArrowLeft",
    "arrowright": "ArrowRight",
    "arrowup": "ArrowUp",
    "backspace": "Backspace",
    "capslock": "CapsLock",
    "cmd": "Meta",
    "ctrl": "Control",
    "delete": "Delete",
    "end": "End",
    "enter": "Enter",
    "esc": "Escape",
    "home": "Home",
    "insert": "Insert",
    "option": "Alt",
    "pagedown": "PageDown",
    "pageup": "PageUp",
    "shift": "Shift",
    "space": " ",
    "super": "Meta",
    "tab": "Tab",
    "win": "Meta",
}


class LocalPlaywrightComputer(AsyncComputer):
    """A computer, implemented using a local Playwright browser."""

    def __init__(self):
        self._playwright: Union[Playwright, None] = None
        self._browser: Union[Browser, None] = None
        self._page: Union[Page, None] = None

    async def _get_browser_and_page(self) -> tuple[Browser, Page]:
        width, height = self.dimensions
        launch_args = [f"--window-size={
      
      width},{
      
      height}"]
        browser = await self.playwright.chromium.launch(headless=False, args=launch_args)
        page = await browser.new_page()
        await page.set_viewport_size({
    
    "width": width, "height": height})
        await page.goto("https://www.bing.com")
        return browser, page

    async def __aenter__(self):
        # Start Playwright and call the subclass hook for getting browser/page
        self._playwright = await async_playwright().start()
        self._browser, self._page = await self._get_browser_and_page()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self._browser:
            await self._browser.close()
        if self._playwright:
            await self._playwright.stop()

    @property
    def playwright(self) -> Playwright:
        assert self._playwright is not None
        return self._playwright

    @property
    def browser(self) -> Browser:
        assert self._browser is not None
        return self._browser

    @property
    def page(self) -> Page:
        assert self._page is not None
        return self._page

    @property
    def environment(self) -> Environment:
        return "browser"

    @property
    def dimensions(self) -> tuple[int, int]:
        return (1024, 768)

    async def screenshot(self) -> str:
        """Capture only the viewport (not full_page)."""
        png_bytes = await self.page.screenshot(full_page=False)
        return base64.b64encode(png_bytes).decode("utf-8")

    async def click(self, x: int, y: int, button: Button = "left") -> None:
        playwright_button: Literal["left", "middle", "right"] = "left"

        # Playwright only supports left, middle, right buttons
        if button in ("left", "right", "middle"):
            playwright_button = button  # type: ignore

        await self.page.mouse.click(x, y, button=playwright_button)

    async def double_click(self, x: int, y: int) -> None:
        await self.page.mouse.dblclick(x, y)

    async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None:
        await self.page.mouse.move(x, y)
        await self.page.evaluate(f"window.scrollBy({
      
      scroll_x}, {
      
      scroll_y})")

    async def type(self, text: str) -> None:
        await self.page.keyboard.type(text)

    async def wait(self) -> None:
        await asyncio.sleep(1)

    async def move(self, x: int, y: int) -> None:
        await self.page.mouse.move(x, y)

    async def keypress(self, keys: list[str]) -> None:
        for key in keys:
            mapped_key = CUA_KEY_TO_PLAYWRIGHT_KEY.get(key.lower(), key)
            await self.page.keyboard.press(mapped_key)

    async def drag(self, path: list[tuple[int, int]]) -> None:
        if not path:
            return
        await self.page.mouse.move(path[0][0], path[0][1])
        await self.page.mouse.down()
        for px, py in path[1:]:
            await self.page.mouse.move(px, py)
        await self.page.mouse.up()


if __name__ == "__main__":
    asyncio.run(main())

此代码实现了一个自动化浏览器操作的异步程序，借助 Playwright 库实现了浏览器的各种操作，让代理能够在浏览器中完成搜索和总结任务。