解密prompt系列56.Agent context Engineering - 单智能体代码剖析

妮妮安琪 發表於 2025-7-10 07:40:00

解密prompt系列56.Agent context Engineering - 单智能体代码剖析

<p>近期关于智能体设计有诸多观点，一个关键点让我豁然开朗——无论智能体是1个还是多个，是编排驱动还是自主决策，是静态预定义还是动态生成，Context上下文的管理机制始终是设计的核心命脉。它决定了：每个节点使用哪些信息？分别更新或修改哪些信息？多步骤间如何传递？智能体间是否共享、如何共享？后续篇章我们将剖析多个热门开源项目，一探它们如何驾驭Context。本章聚焦单智能体设计，选取两个代表性框架：模仿OpenAI深度研究范式的Gemini-fullstack（编排式）与模仿Manus的OpenManus（自主式）。</p>
<h2 id="框架对比">框架对比</h2>
<p>先来整体对比下两个框架，这样懒得看细节的盆友们就可以只看下表了~</p>
<table>
<thead>
<tr>
<th style="text-align: left">特性</th>
<th style="text-align: left">Gemini Deep Search (编排式)</th>
<th style="text-align: left">OpenManus Flow Mode (规划式自主)</th>
<th style="text-align: left">OpenManus Manus Mode (纯自主)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><strong>智能体类型</strong></td>
<td style="text-align: left">单智能体，编排驱动</td>
<td style="text-align: left"><strong>单智能体</strong>，全局规划 + 分步ReAct循环</td>
<td style="text-align: left">单智能体，ReAct循环</td>
</tr>
<tr>
<td style="text-align: left"><strong>任务分解</strong></td>
<td style="text-align: left">固定流程节点</td>
<td style="text-align: left">全局Plan</td>
<td style="text-align: left">动态思考下一步(Think)</td>
</tr>
<tr>
<td style="text-align: left"><strong>Context 范围</strong></td>
<td style="text-align: left"><strong>节点级隔离</strong> (每节点用特定输入)</td>
<td style="text-align: left"><strong>Step级隔离</strong> (每Step用Plan状态+当前任务)</td>
<td style="text-align: left"><strong>线性增长+窗口截断</strong> (全历史)</td>
</tr>
<tr>
<td style="text-align: left"><strong>Context 传递</strong></td>
<td style="text-align: left">传递任务结果 (Query列表, 摘要文本)</td>
<td style="text-align: left">外层：传递Plan状态 + Step结果字符串；内层全部历史</td>
<td style="text-align: left">传递完整ReAct历史 (截断后)</td>
</tr>
<tr>
<td style="text-align: left"><strong>状态管理</strong></td>
<td style="text-align: left">无显式状态，依赖数据流</td>
<td style="text-align: left">显式Plan & Step状态管理</td>
<td style="text-align: left">隐含在消息历史中</td>
</tr>
<tr>
<td style="text-align: left"><strong>优势</strong></td>
<td style="text-align: left">流程清晰可控，模块化，引用处理优雅</td>
<td style="text-align: left">步骤Context轻量，潜在减少迭代次数</td>
<td style="text-align: left">灵活性高</td>
</tr>
<tr>
<td style="text-align: left"><strong>挑战</strong></td>
<td style="text-align: left">灵活性较低，节点间“思考”不共享</td>
<td style="text-align: left">Plan质量依赖（或需要动态调整），Step间Context隔离可能导致信息断层/冲突</td>
<td style="text-align: left">Context膨胀，长程依赖易丢失，多轮消息会引入噪音</td>
</tr>
</tbody>
</table>
<h2 id="gemini-deep-search--编排智能体">Gemini Deep Search- 编排智能体</h2>
<blockquote>
<ul>
<li>gemini-fullstack-langgraph-quickstart</li>
</ul>
</blockquote>
<p>Gemini Deep Search 是一个典型的编排式智能体。其执行流程预先定义，核心在于引入了反思节点，用于动态判断信息收集是否充分。流程清晰简洁：</p>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828424-1881725172.png" class="lazyload"></p>
<p>图释：Gemini Deep Search的核心编排流程，包含查询生成、并行搜索、反思评估、路由决策和最终答案生成五个关键节点，通过反思节点实现循环控制。</p>
<h3 id="1-查询生成generate-query">1. 查询生成（Generate Query）</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828246-1833371495.png" class="lazyload"></p>
<ul>
<li>核心亮点：将“思考过程”工具化/结构化输出。</li>
<li>使用Pydantic模型强制输出包含查询列表(query)和推理依据(rationale)</li>
</ul>
<pre><code class="language-python">
class SearchQueryList(BaseModel):
query: List = Field(
   description="A list of search queries to be used for web research."
)
rationale: str = Field(
   description="A brief explanation of why these queries are relevant to the research topic."
)
</code></pre>
<p>实际应用中会发现把 <strong>思考工具化（结构化）</strong> 有很多优点:</p>
<ul>
<li>模型无关性：不依赖模型的“思考”原生能力，任何支持结构化输出的模型皆可。</li>
<li>简洁可控：结构化输出比模型自由生成的思考通常更简短、更聚焦，避免冗余和发散。。</li>
</ul>
<h3 id="2-并行搜索摘要web_research">2. 并行搜索+摘要（Web_research)</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828334-489604207.png" class="lazyload"></p>
<p>然后就是基于多个query的并行搜索模块这里直接使用了Langgraph自带的Send多线程并发模式，然后直接让大模型基于检索上文进行总结。这里可参考不多，因为引用生成等逻辑在Gemini的API中，用开源模型的盆友需要重新适配。</p>
<p>不过有意思的是现在<strong>如何给模型推理插入引用</strong>，原来多数都是在指令中加入要求，让模型一边推理一边生成引用序号<code>()</code>，不过在新的模型能力下有了很多天马星空的方案。像Claude给出过先直接进行无引用推理，然后再让模型重新基于推理结果，在不修改原文的基础上，插入引用的markdown链接。</p>
<p>这里Google是直接推理API中集成了类似能力，哈哈我也没用过Gemini的API，不过看代码，应该是类似以下的结构, 会通过结构化推理（哈哈Google很爱用结构化推理，其实个人也觉得不论是工具还是Function Calling或者是Thinking，最底层的对接方案还是结构化推理），返回引用序号列表对应的文字段落的起止位置。</p>
<pre><code class="language-python"># Gemini API响应结构
response.candidates.grounding_metadata = {
"grounding_supports": [
   {
         "segment": {"start_index": 0, "end_index": 50},
         "grounding_chunk_indices":
   }
],
"grounding_chunks": [
   {
         "web": {
            "uri": "https://example.com/page1",
            "title": "Example Page 1"
         }
   }
]
}
</code></pre>
<p>考虑到这里涉及两次模型推理，第一次就是每个query的搜索总结，第二次是最终基于所有总结段落的二次汇总推理。因此这里项目在本次推理（第一次）中把citation以markdown超链接的格式插回到了原文中，这样二次推理可以直接生成引用链接。（URL进行了缩写，降低推理token和copy出错的可能性）</p>
<h3 id="3-反思评估reflection">3. 反思评估（Reflection）</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828366-1909781619.png" class="lazyload"></p>
<ul>
<li>评估当前收集的摘要信息是否足以回答用户问题。</li>
<li>同样采用结构化输出，个人实践的优化点：可扩展Reflection模型，加入reasoning字段，让模型先分析“回答用户需要什么信息？”、“当前已有哪些信息？”，再做出判断和提出补充查询，使决策更透明、依据更充分。</li>
</ul>
<pre><code class="language-python">class Reflection(BaseModel):
is_sufficient: bool = Field(
   description="Whether the provided summaries are sufficient to answer the user's question."
)
knowledge_gap: str = Field(
   description="A description of what information is missing or needs clarification."
)
follow_up_queries: List = Field(
   description="A list of follow-up queries to address the knowledge gap."
)

</code></pre>
<h3 id="4-决策路由router">4. 决策路由（Router）</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828276-908737757.png" class="lazyload"></p>
<ul>
<li>根据Reflection节点的输出 (is_sufficient) 和预设的最大循环次数，决定流程走向（继续搜索Generate Query或进入Finalize Answer）。</li>
<li>Context管理：此节点本身不修改Context，仅基于Reflection的Context进行流程控制。</li>
</ul>
<h3 id="5-生成最终答案finalize-answer">5. 生成最终答案（Finalize Answer）</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828430-795911744.png" class="lazyload"></p>
<ul>
<li>汇总所有步骤收集到的摘要信息（已包含Markdown引用链接）</li>
<li>进行最终推理，生成回答，并保留摘要中已嵌入的引用信息。</li>
</ul>
<h3 id="context管理">Context管理</h3>
<p><strong>Gemini的Context管理</strong></p>
<ul>
<li>模块化隔离：每个节点聚焦特定任务，使用特定的Context输入（如Generate Query只用原始Query，Web Research用特定Query列表，Reflection用所有摘要）。</li>
<li>无状态传递：节点间不共享“推理状态”上下文（如之前的思考过程），主要传递任务结果（Query列表、摘要文本）。</li>
</ul>
<h2 id="openmanus---自主智能体">OpenManus - 自主智能体</h2>
<blockquote>
<ul>
<li>OpenManus</li>
</ul>
</blockquote>
<p>OpenManus 提供了两种模式：Manus Mode（基础ReAct）和Flow Mode（规划驱动）。虽然项目将Flow称为“多智能体”，但从Context管理角度看，更像是单智能体的两种任务分解策略：<strong>Manus是局部规划+即时执行，Flow是全局规划（Plan）+分步执行（Manus）。</strong></p>
<h3 id="manus-模式经典react循环">Manus 模式：经典ReAct循环</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828219-1488599321.png" class="lazyload"></p>
<p>Manus模式本质是ReAct循环：思考(Think)->行动(Act)->观察(Observe)，循环执行直至任务完成。核心流程：</p>
<ul>
<li>Think：基于当前Context（用户问题+历史消息+可用工具描述），模型决定下一步动作（调用哪个工具及其参数）。</li>
<li>Act：执行所选工具（如browser-use进行复杂网页交互操作、文本编辑器）。</li>
<li>Observe：将工具执行结果作为ToolMessage加入Context。</li>
<li>循环上述步骤，直到Think选择终止工具。</li>
</ul>
<p><strong>Manus的Context管理</strong><br>
线性增长：整个任务由一个智能体完成，Context随执行步骤线性增长，每一步都使用前置的所有message信息。</p>
<h3 id="flow-模式">Flow 模式</h3>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828308-624267505.png" class="lazyload"></p>
<p>Flow的核心思想是引入全局Plan规划器。在当前模型能力下，先规划再执行有助于：</p>
<ul>
<li>简化步骤Context：每个Manus步骤只需关注当前Step和Plan状态，上下文更轻量。</li>
<li>减少迭代次数：全局视野可能降低智能体陷入局部循环的概率。</li>
<li>潜在挑战：步骤间Context隔离可能导致信息重复/冲突；全局规划器传递任务时可能丢失细节（Context Gap）。</li>
</ul>
<p>Plan工具设计 (核心)： Plan本身通过结构化工具实现管理：</p>
<ul>
<li>两层结构： Plan -> Steps。</li>
<li>操作完备：创建(Create)、更新(Update)、列表(List)、获取(Get)、激活(Set Active)、标记步骤状态(Mark Step)、删除(Delete)</li>
<li>状态跟踪： Step状态包括未开始(not_started)、进行中(in_progress)、完成(completed)、阻塞(blocked)。</li>
<li>核心参数示例如下</li>
</ul>
<pre><code class="language-python">class PlanningTool(BaseTool):
"""
A planning tool that allows the agent to create and manage plans for solving complex tasks.
The tool provides functionality for creating plans, updating plan steps, and tracking progress.
"""

name: str = "planning"
description: str = _PLANNING_TOOL_DESCRIPTION
parameters: dict = {
   "type": "object",
   "properties": {
         "command": {
            "description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
            "enum": [
               "create",
               "update",
               "list",
               "get",
               "set_active",
               "mark_step",
               "delete",
            ],
            "type": "string",
         },
         "plan_id": {
            "description": "Unique identifier for the plan. Required for create, update, set_active, and delete commands. Optional for get and mark_step (uses active plan if not specified).",
            "type": "string",
         },
         "title": {
            "description": "Title for the plan. Required for create command, optional for update command.",
            "type": "string",
         },
         "steps": {
            "description": "List of plan steps. Required for create command, optional for update command.",
            "type": "array",
            "items": {"type": "string"},
         },
         "step_index": {
            "description": "Index of the step to update (0-based). Required for mark_step command.",
            "type": "integer",
         },
         "step_status": {
            "description": "Status to set for a step. Used with mark_step command.",
            "enum": ["not_started", "in_progress", "completed", "blocked"],
            "type": "string",
         },
         "step_notes": {
            "description": "Additional notes for a step. Optional for mark_step command.",
            "type": "string",
         },
   },
   "required": ["command"],
   "additionalProperties": False,
}
</code></pre>
<p>下面我们来看下Plan创建、遍历、更新的整个流程</p>
<ol>
<li>创建初始Plan (create_initial_plan):</li>
</ol>
<ul>
<li>基于用户Query生成Plan (Steps)。</li>
<li>Prompt设计的几个亮点关键词：简洁有力，强调关键里程碑(Key Milestones)、可行动性(Actionable)、清晰度(Clarity)、效率(Efficiency)。</li>
</ul>
<pre><code class="language-python">system_message = Message.system_message(
"You are a planning assistant. Create a concise, actionable plan with clear steps. "
"Focus on key milestones rather than detailed sub-steps. "
"Optimize for clarity and efficiency."
)

# Create a user message with the request
user_message = Message.user_message(
f"Create a reasonable plan with clear steps to accomplish the task: {request}"
)
</code></pre>
<ul>
<li>效果评估：生成的Plan结构（Plan-Step两层）清晰，但内容质量（步骤逻辑、并行性）较基础，有优化空间。</li>
</ul>
<p><img alt="image" loading="lazy" src="https://img2024.cnblogs.com/blog/1326688/202507/1326688-20250710073828214-864079668.png" class="lazyload"></p>
<ol start="2">
<li>执行Plan (execute):</li>
</ol>
<ul>
<li>按顺序遍历Plan中的每个Step。</li>
<li>将当前Step标记为in_progress。</li>
<li>调用execute_step执行当前Step。</li>
</ul>
<ol start="3">
<li>执行单个Step (execute_step):</li>
</ol>
<ul>
<li>为当前Step实例化一个Manus智能体。</li>
<li>关键Context注入：这里同时提供全部plan status能解决（一部分）有些步骤模型会发散把多个步骤一起做了导致重复或者冲突的问题。
<ul>
<li>当前任务： "You are now working on step {index}: '{step_text}'"</li>
<li>全局状态： "CURRENT PLAN STATUS: {plan_status}" (包含所有Steps的状态)</li>
</ul>
</li>
</ul>
<pre><code class="language-python">step_prompt = f"""
CURRENT PLAN STATUS:
{plan_status}

YOUR CURRENT TASK:
You are now working on step {self.current_step_index}: "{step_text}"

Please execute this step using the appropriate tools. When you're done, provide a summary of what you accomplished.
"""
</code></pre>
<ol start="3">
<li>所有Plan执行完成进入汇总阶段：会基于原始生成的所有Plan的执行状态，让模型给出一份汇总</li>
</ol>
<p><strong>Flow的Context管理</strong></p>
<ul>
<li>分层Context：全局Plan状态 vs. 单个Step执行Context。</li>
<li>智能体隔离：每个Step由独立的Manus智能体执行，其Context主要包含：Plan全局状态 + 当前Step描述 + 当前Step执行历史 (ReAct循环)。</li>
<li>状态共享： Plan Status（所有Step状态）作为只读Context传递给每个执行Step的Manus智能体，有助于缓解步骤间冲突。</li>
<li>信息传递： Step间不直接共享详细推理/操作Context，仅通过Plan Status的宏观状态（完成/阻塞）和最终结果字符串进行间接传递。</li>
</ul>
<hr>
<p>Reference</p>
<ul>
<li>how we build our multi-agent system： Claude给出的多智能体构建智能</li>
<li>Don't build multi-agents：Devin创始人指出的多智能构建中的一些坑</li>
</ul>
<p>想看更全的大模型论文·微调预训练数据·开源框架·AIGC应用 >> DecryPrompt</p><br><br>
来源：https://www.cnblogs.com/gogoSandy/p/18976110

頁: [1]

圆梦公社's Archiver

解密prompt系列56.Agent context Engineering - 单智能体代码剖析