Qwen3.6-35B-A3B 全面评测：阿里开源模型如何超越前沿级水平

之一的之一 發表於 2026-4-17 19:05:00

Qwen3.6-35B-A3B 全面评测：阿里开源模型如何超越前沿级水平

<h1 id="qwen36-35b-a3b">Qwen3.6-35B-A3B 全面评测：阿里开源模型如何超越前沿级水平</h1>
<h2 id="tldr">TL;DR</h2>
<ul>
<li><strong>Qwen3.6-35B-A3B</strong> 是阿里 Qwen 团队2026年4月16日发布的最新开源模型，采用稀疏 MoE 架构，35B 总参数但每 token 仅激活 3B</li>
<li><strong>Apache 2.0 许可证</strong>，完全开源可商用</li>
<li>在 Terminal-Bench 2.0 得分 <strong>51.5</strong>（vs Gemma4-31B 的 42.9），SWE-bench Verified 得分 <strong>73.4</strong>，击败 Claude Sonnet 4.5</li>
<li>支持 <strong>262,144 token</strong> 上下文，专为代理编程（Agentic Coding）优化</li>
<li>24GB Mac 可通过 GGUF 量化运行</li>
</ul>
<hr>
<h2 id="qwen36-35b-a3b_1">Qwen3.6-35B-A3B 是什么？</h2>
<p><strong>Qwen3.6-35B-A3B</strong> 是阿里巴巴 Qwen 团队发布的最新开源模型，2026年4月16日正式上线。该模型专为<strong>代理编程</strong>（agentic coding）和<strong>仓库级代码推理</strong>设计。</p>
<p>模型名称解读：
- <strong>35B</strong> — 总参数数量（MoE 稀疏架构）
- <strong>A3B</strong> — 每 token <strong>仅激活 3B 参数</strong>，极大降低推理成本</p>
<p>这是典型的<strong>稀疏 MoE（Mixture-of-Experts）</strong> 架构——每个 token 只会激活少数「专家」模块的神经元，大部分参数处于休眠状态。结果：用 1/10 的推理算力，达到接近顶级dense模型的能力。</p>
<h3 id="apache-20">Apache 2.0 — 真开源</h3>
<p>不同于很多「伪开源」模型，Qwen3.6-35B-A3B 采用 <strong>Apache 2.0</strong> 许可证：
- ✅ 商用免费
- ✅ 无版税
- ✅ 可修改和再分发
- ✅ 包含专利授权</p>
<hr>
<h2 id="_1">基准测试表现</h2>
<p>Qwen3.6-35B-A3B 在多项基准测试中表现出色：</p>
<table>
<thead>
<tr>
<th>基准测试</th>
<th>Qwen3.6-35B-A3B</th>
<th>Gemma4-31B</th>
<th>Claude Sonnet 4.5</th>
</tr>
</thead>
<tbody>
<tr>
<td>Terminal-Bench 2.0</td>
<td><strong>51.5</strong></td>
<td>42.9</td>
<td>—</td>
</tr>
<tr>
<td>SWE-bench Pro</td>
<td><strong>49.5</strong></td>
<td>35.7</td>
<td>—</td>
</tr>
<tr>
<td>SWE-bench Verified</td>
<td><strong>73.4</strong></td>
<td>—</td>
<td>—</td>
</tr>
<tr>
<td>RealWorldQA</td>
<td><strong>85.3</strong></td>
<td>—</td>
<td>70.3</td>
</tr>
</tbody>
</table>
<p><strong>Terminal-Bench 2.0</strong>（代理式终端编程）得分 51.5，比 Gemma4-31B（42.9）高出 <strong>20%</strong>。</p>
<p><strong>SWE-bench Pro</strong>（真实 GitHub 仓库的软件工程能力）得分 49.5，比 Gemma4-31B（35.7）高出 <strong>38%</strong>。</p>
<p><strong>RealWorldQA</strong>（真实世界多模态理解）得分 85.3，击败 Claude Sonnet 4.5（70.3）<strong>21%</strong>。</p>
<hr>
<h2 id="_2">代理编程能力</h2>
<p>Qwen3.6-35B-A3B 的核心设计目标就是<strong>代理编程</strong>：</p>
<ol>
<li><strong>大型仓库导航</strong> — 理解项目结构、依赖、架构</li>
<li><strong>多文件多语言代码编写与修改</strong></li>
<li><strong>命令执行</strong> — 运行测试、构建系统、操作终端</li>
<li><strong>工具调用</strong> — 与 IDE（Continue.dev、Cursor）、API、CICD 系统集成</li>
</ol>
<h3 id="_3">思维模式保留</h3>
<p>Qwen3.6 的核心创新是<strong>思维模式保留</strong>——在多步骤代理工作流中保持完整推理上下文。这意味着：
- 决策一致性更强
- 减少重复推理，节省 token
- 在思考/非思考模式间优化 KV 缓存利用率</p>
<h3 id="262144-token">262,144 token 上下文</h3>
<p><strong>262,144 token</strong> 的超长上下文让 Qwen3.6 可以：
- 单次输入完整处理中大型代码仓库
- 跨数千行代码保持首尾一致的理解
- 对文件间依赖和架构模式进行推理</p>
<hr>
<h2 id="_4">如何本地运行</h2>
<h3 id="ollama">Ollama（最简单）</h3>
<pre class="codehilite"><code class="language-bash">brew install ollama
ollama run qwen3.6:35b-a3b
</code></pre>

<h3 id="unsloth-ggufmac">Unsloth GGUF（Mac 推荐）</h3>
<p>通过 4-bit 量化，24GB 内存的 Mac M3 Max 也能流畅运行。完整 F16 模型需要约 72GB，但 4-bit 量化后仅需 ~18-20GB VRAM。</p>
<h3 id="sglang">SGLang（生产级）</h3>
<pre class="codehilite"><code class="language-bash">python -m sglang.launch_server \
--model-path Qwen/Qwen3.6-35B-A3B \
--port 8000 --tp-size 8 \
--context-length 262144
</code></pre>

<hr>
<h2 id="_5">获取渠道</h2>
<table>
<thead>
<tr>
<th>平台</th>
<th>链接</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hugging Face</td>
<td>Qwen/Qwen3.6-35B-A3B</td>
</tr>
<tr>
<td>Ollama</td>
<td><code>ollama run qwen3.6:35b-a3b</code></td>
</tr>
<tr>
<td>Unsloth GGUF</td>
<td>unsloth/Qwen3.6-35B-A3B-GGUF</td>
</tr>
<tr>
<td>Qwen Studio</td>
<td>https://qwen.ai</td>
</tr>
</tbody>
</table>
<hr>
<h2 id="_6">总结</h2>
<p><strong>Qwen3.6-35B-A3B 代表着开源 AI 的分水岭时刻</strong>——首次，开发者可以在本地运行一个：
- 3B 激活参数 + 35B 总参数的稀疏 MoE 模型
- Gemma4-31B 代理编程能力 <strong>+20%</strong>
- SOTA 级别编程基准测试表现
- 24GB Mac 可运行的本地部署
- Apache 2.0 完全开源可商用</p>
<p><strong>原文链接</strong>: Qwen3.6-35B-A3B Complete Review</p><br><br>
来源：https://www.cnblogs.com/sing1ee/p/19885253

頁: [1]

圆梦公社's Archiver

Qwen3.6-35B-A3B 全面评测：阿里开源模型如何超越前沿级水平