使用 JYPPX.DeploySharp 高效部署 PaddleOCR，解锁多种高性能 OCR 文字识别方案

米米吖 發表於 2026-1-28 22:40:00

使用 JYPPX.DeploySharp 高效部署 PaddleOCR，解锁多种高性能 OCR 文字识别方案

<h1 id="使用-jyppxdeploysharp-高效部署-paddleocr解锁多种高性能-ocr-文字识别方案">使用 JYPPX.DeploySharp 高效部署 PaddleOCR，解锁多种高性能 OCR 文字识别方案</h1>
<blockquote>
<p>本文介绍如何通过 DeploySharp 框架在 .NET 环境下部署 PaddleOCR 模型，支持 OpenVINO、TensorRT、ONNX Runtime 等多种推理引擎，实现百毫秒级文字识别。</p>
</blockquote>
<hr>
<h2 id="目录">目录</h2>
<ul>
<li>一、前言</li>
<li>二、核心技术原理解析</li>
<li>三、DeploySharp 架构优势</li>
<li>四、支持的推理设备</li>
<li>五、快速开始指南</li>
<li>六、性能测试与分析</li>
<li>七、常见问题解答</li>
<li>八、软件获取</li>
<li>九、技术支持</li>
</ul>
<hr>
<h2 id="一前言">一、前言</h2>
<p>OCR（光学字符识别）技术在数字化办公、文档管理、票据识别等场景中发挥着重要作用。百度飞桨开源的 <strong>PaddleOCR</strong> 作为业界领先的 OCR 框架，以其优异的识别精度和丰富的功能特性深受开发者喜爱。</p>
<p>一年前，我基于自己开发的 OpenVINO C# API 项目，在 .NET 框架下使用 OpenVINO 部署工具部署 PaddleOCR 系列模型，推出了 <strong>PaddleOCR-OpenVINO-CSharp</strong> 项目。借助 OpenVINO 在 CPU 上的强大推理优化能力，该项目成功实现了在纯 CPU 环境下完成图片文字识别、版面分析及表格识别等功能，推理速度可控制在 300 毫秒以内。</p>
<p>随着项目的发展和应用场景的多样化，单一推理引擎已无法满足所有需求。近期，我将 OpenVINO、TensorRT、ONNX Runtime 等主流推理工具进行了统一封装，推出了 <strong>DeploySharp</strong> 开源项目。该项目的核心优势在于：</p>
<ul>
<li><strong>统一接口</strong>：通过底层接口抽象，实现一套代码适配多种推理引擎</li>
<li><strong>灵活部署</strong>：开发者可根据实际硬件环境选择最优推理方案</li>
<li><strong>性能优化</strong>：充分发挥各推理引擎的硬件加速能力</li>
</ul>
<p>得益于 DeploySharp 底层接口统一的优势，开发者现在可以用同一段代码在 OpenVINO、TensorRT、ONNX Runtime 等多种推理引擎间自由切换。近期，我们完成了 PaddleOCR 模型的支持更新，为 .NET 开发者提供了一套完整的 OCR 解决方案。</p>
<p>目前，PaddleOCR 功能已集成至 DeploySharp 开源项目中（代码已上传至仓库，NuGet 包正在筹备中）。为了让大家快速体验新版 PaddleOCR 的极致性能，我们特别准备了 <strong>JYPPX.DeploySharp.OpenCvSharp.PaddleOcr.TestDemo</strong> 演示程序，支持即开即用，无需复杂配置。</p>
<hr>
<h2 id="二核心技术原理解析">二、核心技术原理解析</h2>
<h3 id="21-paddleocr-工作流程">2.1 PaddleOCR 工作流程</h3>
<p>PaddleOCR 采用经典的「检测-分类-识别」三阶段流水线架构：</p>
<pre><code>输入图片
│
▼
┌─────────────┐
│ 文本检测 │ → 检测图片中的文本区域位置
│ (Detection) │
└─────────────┘
│
▼
┌─────────────┐
│ 文本方向分类 │ → 判断文本方向（180度翻转等）
│ (Classifier)│
└─────────────┘
│
▼
┌─────────────┐
│ 文本识别 │ → 识别文本区域的具体内容
│ (Recognition)│
└─────────────┘
│
▼
输出识别结果
</code></pre>
<h3 id="22-三阶段模型详解">2.2 三阶段模型详解</h3>
<table>
<thead>
<tr>
<th>阶段</th>
<th>模型名称</th>
<th>输入</th>
<th>输出</th>
<th>作用</th>
</tr>
</thead>
<tbody>
<tr>
<td>检测</td>
<td>PP-OCRv5_det</td>
<td>原始图片 (3xHxW)</td>
<td>文本框坐标</td>
<td>定位文本区域</td>
</tr>
<tr>
<td>分类</td>
<td>PP-OCRv5_cls</td>
<td>裁剪文本框 (3x80x160)</td>
<td>方向标签</td>
<td>纠正文本方向</td>
</tr>
<tr>
<td>识别</td>
<td>PP-OCRv5_rec</td>
<td>裁剪文本框 (3x48xL)</td>
<td>文本内容</td>
<td>识别字符序列</td>
</tr>
</tbody>
</table>
<h3 id="23-性能优化策略">2.3 性能优化策略</h3>
<ol>
<li><strong>模型量化</strong>：使用 int8 量化减小模型体积，提升推理速度</li>
<li><strong>动态批处理</strong>：支持 Batch Size > 1，提高 GPU 利用率</li>
<li><strong>并发推理</strong>：支持多线程并发处理，充分利用多核性能</li>
<li><strong>硬件加速</strong>：针对不同硬件选择最优计算后端</li>
</ol>
<hr>
<h2 id="三deploysharp-架构优势">三、DeploySharp 架构优势</h2>
<p>DeploySharp 的核心设计理念是「<strong>统一接口，灵活部署</strong>」，其架构如下图所示：</p>
<pre><code>┌─────────────────────────────────────────────────────────┐
│                应用层 (Application)                │
│          PaddleOCR 文字识别 / 其他模型应用          │
└─────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│             DeploySharp 抽象接口层                │
│统一的模型加载 / 推理执行 / 资源管理接口             │
└─────────────────────────────────────────────────────────┘
                        │
         ┌───────────────┼───────────────┐
         ▼             ▼             ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ OpenVINO │ │ TensorRT │ │ ONNX Runtime│
│ Engine    │ │ Engine    │ │ Engine │
│(CPU 优化) │ │ (GPU 加速) │ │ (跨平台支持) │
└───────────────┘ └───────────────┘ └───────────────┘
         │             │             │
         ▼             ▼             ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│Intel CPU │ │NVIDIA GPU │ │ 多种硬件设备 │
│             │ │             │ │ (CPU/GPU/DML) │
└───────────────┘ └───────────────┘ └───────────────┘
</code></pre>
<p><strong>主要优势：</strong></p>
<ul>
<li><strong>零代码切换</strong>：更换推理引擎无需修改业务代码</li>
<li><strong>资源高效利用</strong>：自动管理模型生命周期和计算资源</li>
<li><strong>扩展性强</strong>：易于添加新的推理引擎支持</li>
<li><strong>生产就绪</strong>：经过充分测试，可直接用于生产环境</li>
</ul>
<hr>
<h2 id="四支持的推理设备">四、支持的推理设备</h2>
<p>本演示程序支持多种主流推理后端，覆盖从入门级设备到高性能服务器的各种场景：</p>
<table>
<thead>
<tr>
<th>推理引擎</th>
<th>支持设备</th>
<th>适用场景</th>
<th>性能特点</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>OpenVINO</strong></td>
<td>CPU</td>
<td>无 GPU 环境、Intel 处理器</td>
<td>CPU 优化，启动快，稳定</td>
</tr>
<tr>
<td><strong>TensorRT</strong></td>
<td>CUDA 11/12</td>
<td>NVIDIA GPU 高性能场景</td>
<td>GPU 加速，极致性能，需模型转换</td>
</tr>
<tr>
<td><strong>ONNX Runtime CPU</strong></td>
<td>CPU</td>
<td>跨平台部署</td>
<td>通用性强，性能中等</td>
</tr>
<tr>
<td><strong>ONNX Runtime CUDA</strong></td>
<td>CUDA 12</td>
<td>NVIDIA GPU 环境部署</td>
<td>GPU 加速，开箱即用</td>
</tr>
<tr>
<td><strong>ONNX Runtime TensorRT</strong></td>
<td>CUDA 12</td>
<td>NVIDIA GPU 高性能场景</td>
<td>GPU 加速 + TensorRT 优化</td>
</tr>
<tr>
<td><strong>ONNX Runtime DML</strong></td>
<td>DML GPU</td>
<td>Windows 平台多厂商 GPU</td>
<td>支持 AMD/NVIDIA/Intel GPU</td>
</tr>
</tbody>
</table>
<blockquote>
<p><strong>性能提示</strong>：首次加载模型和推理时会较慢，这是正常现象（模型初始化和 JIT 编译）。首次运行时请避免频繁操作，待模型预热完成后性能将显著提升。</p>
</blockquote>
<hr>
<h2 id="五快速开始指南">五、快速开始指南</h2>
<h3 id="51-程序界面概览">5.1 程序界面概览</h3>
<p>运行程序后，主界面如下图所示：</p>
<img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813653-1755028179.png">
<p><strong>核心操作说明：</strong></p>
<table>
<thead>
<tr>
<th>操作项</th>
<th>说明</th>
<th>注意事项</th>
</tr>
</thead>
<tbody>
<tr>
<td>推理后端</td>
<td>选择使用的推理引擎</td>
<td>切换后需重新加载模型</td>
</tr>
<tr>
<td>模型路径</td>
<td>预置模型路径，一般无需修改</td>
<td>支持自定义模型路径</td>
</tr>
<tr>
<td>图像路径</td>
<td>选择待识别的图片</td>
<td>支持 JPG/PNG/BMP 等格式</td>
</tr>
<tr>
<td>加载模型</td>
<td>加载指定模型到内存</td>
<td>首次使用必须执行</td>
</tr>
<tr>
<td>推理图片</td>
<td>执行单次图片识别</td>
<td>首次需预热</td>
</tr>
<tr>
<td>时间测试</td>
<td>连续推理十次并统计平均耗时</td>
<td>用于性能评估</td>
</tr>
<tr>
<td>并发数量</td>
<td>调整推理并发线程数</td>
<td>修改后需重新加载模型</td>
</tr>
<tr>
<td>BatchSize</td>
<td>批量处理大小</td>
<td>可动态调整</td>
</tr>
</tbody>
</table>
<hr>
<h3 id="52-openvino-推理">5.2 OpenVINO 推理</h3>
<p>OpenVINO 是 Intel 推出的开源工具套件，针对 CPU 和Intel IGPU进行了深度优化，特别适合无 GPU 环境下的高性能推理。</p>
<p><strong>CPU使用步骤：</strong></p>
<p>1.运行程序</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813669-2069610707.png"></p>
<p>2.在「推理后端」下拉框中选择 <strong>OpenVINO</strong></p>
<p>3.点击「加载模型」</p>
<p>4.点击「推理图片」开始识别</p>
<img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813694-1191031576.png">
<p><strong>IGPU使用步骤：</strong></p>
<p>英特尔集显使用流程与上述一致，主要是设备要选择<strong>GPU0</strong>：</p>
<img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813653-923590906.png">
<p><strong>混合设备使用步骤：</strong></p>
<p>英特尔OpenVINO支持CPU+IGPU混合设备推理，即<strong>AUTO</strong>模式，OpenVINO会根据设备情况自主选择，使用方式与上述一致，主要是设备要选择<strong>AUTO</strong>：</p>
<img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813665-1585718457.png">
<p><strong>适用场景：</strong></p>
<ul>
<li>服务器环境部署</li>
<li>低功耗设备</li>
<li>Intel CPU 用户</li>
<li>对启动速度要求高的场景</li>
</ul>
<hr>
<h3 id="53-onnx-runtime-cpu-推理">5.3 ONNX Runtime CPU 推理</h3>
<p>ONNX Runtime 是微软推出的跨平台推理引擎，支持多种硬件加速后端，CPU 模式无需任何依赖即可使用。</p>
<p><strong>使用步骤：</strong></p>
<ol>
<li>运行程序</li>
<li>在「推理后端」下拉框中选择 <strong>ONNX Runtime CPU</strong></li>
<li>点击「加载模型」</li>
<li>点击「推理图片」开始识别</li>
</ol>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813644-1843775817.png"></p>
<p><strong>适用场景：</strong></p>
<ul>
<li>跨平台部署需求</li>
<li>无 GPU 加速环境</li>
<li>需要快速原型验证</li>
</ul>
<hr>
<h3 id="54-onnx-runtime-cuda-推理">5.4 ONNX Runtime CUDA 推理</h3>
<p>CUDA 是 NVIDIA 提供的并行计算平台，可充分利用 GPU 的并行计算能力实现显著加速。</p>
<h4 id="配置步骤">配置步骤</h4>
<ol>
<li>
<p><strong>安装 CUDA 驱动</strong></p>
<ul>
<li>访问 NVIDIA CUDA 官网</li>
<li>下载并安装 CUDA 12.x 版本（测试环境：CUDA 12.3）</li>
</ul>
</li>
<li>
<p><strong>复制依赖文件</strong></p>
<p>将以下 CUDA 相关 DLL 文件复制到程序运行目录：</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813683-448077110.png"></p>
</li>
<li>
<p><strong>启动推理</strong></p>
<p>运行程序，在「推理后端」下拉框中选择 <strong>ONNX Runtime CUDA</strong></p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813700-1885280932.png"></p>
</li>
</ol>
<p><strong>依赖说明：</strong></p>
<table>
<thead>
<tr>
<th>NuGet 包名</th>
<th>版本</th>
</tr>
</thead>
<tbody>
<tr>
<td>Microsoft.ML.OnnxRuntime.Gpu.Windows</td>
<td>1.23.0</td>
</tr>
<tr>
<td>Microsoft.ML.OnnxRuntime.Managed</td>
<td>1.23.0</td>
</tr>
</tbody>
</table>
<p><strong>适用场景：</strong></p>
<ul>
<li>拥有 NVIDIA 显卡的设备</li>
<li>对推理速度有较高要求</li>
<li>需要快速部署无需模型转换</li>
</ul>
<hr>
<h3 id="55-onnx-runtime-tensorrt-推理">5.5 ONNX Runtime TensorRT 推理</h3>
<p>TensorRT 是 NVIDIA 推出的高性能深度学习推理优化器，结合 CUDA 加速可达到极致性能。</p>
<h4 id="配置步骤-1">配置步骤</h4>
<p>依赖文件复制方式与 CUDA 模式一致。</p>
<h4 id="使用步骤">使用步骤</h4>
<ol>
<li>运行程序</li>
<li>在「推理后端」下拉框中选择 <strong>ONNX Runtime TensorRT</strong></li>
<li>点击「加载模型」</li>
<li>点击「推理图片」开始识别</li>
</ol>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813627-173016715.png"></p>
<blockquote>
<p><strong>重要提示</strong>：首次运行推理时，TensorRT 会自动对 ONNX 模型进行优化编译，此过程可能需要数分钟，请耐心等待。编译后的引擎文件会被缓存，后续推理速度将大幅提升。</p>
</blockquote>
<p><strong>依赖说明：</strong></p>
<table>
<thead>
<tr>
<th>NuGet 包名</th>
<th>版本</th>
</tr>
</thead>
<tbody>
<tr>
<td>Microsoft.ML.OnnxRuntime.Gpu.Windows</td>
<td>1.23.2</td>
</tr>
<tr>
<td>Microsoft.ML.OnnxRuntime.Managed</td>
<td>1.23.2</td>
</tr>
</tbody>
</table>
<p><strong>适用场景：</strong></p>
<ul>
<li>对推理速度要求极高的生产环境</li>
<li>NVIDIA GPU 设备</li>
<li>可接受首次运行较长的编译时间</li>
</ul>
<hr>
<h3 id="56-onnx-runtime-dml-推理">5.6 ONNX Runtime DML 推理</h3>
<p>DirectML（DML）是 Windows 平台的高性能硬件加速接口，支持 AMD、NVIDIA 和 Intel 多厂商显卡。</p>
<h4 id="配置步骤-2">配置步骤</h4>
<p>将 DML 相关 DLL 文件复制到程序运行目录：</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813611-139393382.png"></p>
<h4 id="使用步骤-1">使用步骤</h4>
<ol>
<li>运行程序</li>
<li>在「推理后端」下拉框中选择 <strong>ONNX Runtime DML</strong></li>
<li>点击「加载模型」</li>
<li>点击「推理图片」开始识别</li>
</ol>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813659-1655618200.png"></p>
<p><strong>适用场景：</strong></p>
<ul>
<li>Windows 平台用户</li>
<li>AMD 显卡用户</li>
<li>需要统一接口支持多品牌显卡</li>
</ul>
<hr>
<h3 id="57-tensorrtsharp-推理">5.7 TensorRTSharp 推理</h3>
<p>TensorRTSharp 是对 NVIDIA TensorRT 的 C# 封装，提供原生的 TensorRT 引擎加载和推理能力，支持 FP16 精度进一步提升性能。</p>
<h4 id="环境准备">环境准备</h4>
<p>详细的安装和配置指南请参考：</p>
<pre><code>https://mp.weixin.qq.com/s/D0c6j5MmraJO4Eza7tWm1A
</code></pre>
<p>TensorRTSharp 支持 CUDA 11 和 CUDA 12 两个系列，请根据系统安装的 CUDA 版本选择对应的 DLL 文件。</p>
<h4 id="配置步骤-3">配置步骤</h4>
<ol>
<li>
<p><strong>替换 DLL 文件</strong></p>
<p>根据安装的 CUDA 版本，将对应的 TensorRT DLL 文件复制到程序目录：</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813615-1275332341.jpg"></p>
</li>
<li>
<p><strong>模型转换</strong></p>
<p>使用 <code>trtexec</code> 工具将 ONNX 模型转换为 TensorRT 引擎文件：</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813694-718172996.jpg"></p>
</li>
</ol>
<h4 id="模型转换指令">模型转换指令</h4>
<p><strong>文本检测模型（Det）：</strong></p>
<pre><code class="language-bash">trtexec.exe --onnx=PP-OCRv5_mobile_det_onnx.onnx \
--minShapes=x:1x3x32x32 \
--optShapes=x:4x3x640x640 \
--maxShapes=x:8x3x960x960 \
--fp16 \
--memPoolSize=workspace:1024 \
--sparsity=disable \
--saveEngine=PP-OCRv5_mobile_det_f16_onnx.engine
</code></pre>
<p><strong>文本分类模型（Cls）：</strong></p>
<pre><code class="language-bash">trtexec.exe --onnx=PP-OCRv5_mobile_cls_onnx.onnx \
--minShapes=x:1x3x80x160 \
--optShapes=x:8x3x80x160 \
--maxShapes=x:64x3x80x160 \
--fp16 \
--memPoolSize=workspace:1024 \
--sparsity=disable \
--saveEngine=PP-OCRv5_mobile_cls_f16_onnx.engine
</code></pre>
<p><strong>文本识别模型（Rec）：</strong></p>
<pre><code class="language-bash">trtexec.exe --onnx=PP-OCRv5_mobile_rec_onnx.onnx \
--minShapes=x:1x3x48x48 \
--optShapes=x:8x3x48x1024 \
--maxShapes=x:64x3x48x1024 \
--fp16 \
--memPoolSize=workspace:1024 \
--sparsity=disable \
--saveEngine=PP-OCRv5_mobile_rec_f16_onnx.engine
</code></pre>
<h4 id="开始推理">开始推理</h4>
<p>模型转换完成后，在程序中选择对应的 <code>.engine</code> 文件即可开始推理：</p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813690-867069194.jpg"></p>
<p><strong>适用场景：</strong></p>
<ul>
<li>追求极致推理性能</li>
<li>NVIDIA GPU 环境</li>
<li>允许离线模型转换</li>
</ul>
<hr>
<h2 id="六性能测试与分析">六、性能测试与分析</h2>
<h3 id="61-性能测试工具">6.1 性能测试工具</h3>
<p>演示程序内置了完整的性能测试工具，支持两种测试模式：</p>
<ol>
<li><strong>整体耗时统计</strong>：计算从图片输入到结果输出的完整端到端耗时</li>
<li><strong>详细阶段分析</strong>：记录预处理、推理、后处理各阶段的具体耗时</li>
</ol>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813646-333754545.jpg"></p>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813643-1070139529.jpg"></p>
<h3 id="62-tensorrtsharp-性能示例">6.2 TensorRTSharp 性能示例</h3>
<p>以下为使用 TensorRTSharp 在 4 并发配置下的性能测试数据：</p>
<pre><code>Inference time: 53 ms

---- Detection ----

Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       2.01          6.37          0.57          8.96
2       2.23          5.51          0.68          8.43

---- Classification ----

Device/Worker 0:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       1.84          6.89          0.00          8.73
2       1.99          6.97          0.01          8.96

Device/Worker 1:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       1.79          6.66          0.00          8.46
2       1.66          7.60          0.00          9.26

Device/Worker 2:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       1.61          5.31          0.00          6.92
2       1.51          8.01          0.00          9.53

Device/Worker 3:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       1.24          7.73          0.00          8.98
2       1.82          8.35          0.00          10.17

---- Recognition ----

Device/Worker 0:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       0.00          41.97          1.42          43.39
2       0.00          14.50          2.30          16.81

Device/Worker 1:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       0.00          47.40          6.81          54.21
2       0.00          19.42          2.76          22.18

Device/Worker 2:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       0.00          38.10          3.42          41.52
2       0.00          22.36          3.37          25.73

Device/Worker 3:
Inference Time Records:
Index Preprocess(ms) Inference(ms) Postprocess(ms) Total(ms)
1       0.00          109.94          4.58          114.52
2       0.00          26.59          4.55          31.14
</code></pre>
<h3 id="63-性能对比总结">6.3 性能对比总结</h3>
<p>下表为使用洗发水图片，跑10次的平均时间测试：</p>
<table>
<thead>
<tr>
<th>推理引擎</th>
<th>设备</th>
<th>平均耗时</th>
<th>设备类型</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenVINO</td>
<td>CPU</td>
<td>288ms</td>
<td>Intel(R) Core(TM) Ultra 9 288V8核</td>
</tr>
<tr>
<td>OpenVINO</td>
<td>IGPU</td>
<td>99ms</td>
<td>Intel(R) Arc(TM) 140V GPU (16GB)</td>
</tr>
<tr>
<td>OpenVINO</td>
<td>混合 AUTO：IGPU+CPU</td>
<td>100ms</td>
<td>Intel(R) Core(TM) Ultra 9 288V8核<br>Intel(R) Arc(TM) 140V GPU (16GB)</td>
</tr>
<tr>
<td>ONNX Runtime</td>
<td>CPU</td>
<td>656ms</td>
<td>AMD Ryzen 7 5800H with Radeon Graphics 8核</td>
</tr>
<tr>
<td>ONNX Runtime DML</td>
<td>GPU</td>
<td>114ms</td>
<td>NVIDIA GeForce RTX 3060 Laptop GPU</td>
</tr>
<tr>
<td>ONNX Runtime DML</td>
<td>IGPU</td>
<td>331ms</td>
<td>Intel(R) Arc(TM) 140V GPU (16GB)</td>
</tr>
<tr>
<td>ONNX Runtime CUDA</td>
<td>GPU</td>
<td>93ms</td>
<td>NVIDIA GeForce RTX 3060 Laptop GPU</td>
</tr>
<tr>
<td>ONNX Runtime TensorRT</td>
<td>GPU</td>
<td>52ms</td>
<td>NVIDIA GeForce RTX 3060 Laptop GPU</td>
</tr>
<tr>
<td>TensorRTSharp</td>
<td>GPU</td>
<td>51ms</td>
<td>NVIDIA GeForce RTX 3060 Laptop GPU</td>
</tr>
</tbody>
</table>
<blockquote>
<p><strong>性能测试征集</strong>：我们欢迎广大开发者分享各自的测试数据。请在评论区提供您的测试配置（硬件型号、并发数、Batch Size）和实测耗时，后续我们将整理成性能基准对比表。</p>
</blockquote>
<hr>
<h2 id="七常见问题解答">七、常见问题解答</h2>
<h3 id="q1-首次推理为什么特别慢">Q1: 首次推理为什么特别慢？</h3>
<p><strong>A:</strong> 首次推理时需要进行以下操作：</p>
<ul>
<li>模型加载到内存</li>
<li>推理引擎初始化</li>
<li>JIT 编译（部分引擎）</li>
</ul>
<p>这是正常现象，后续推理速度会显著提升。</p>
<hr>
<h3 id="q2-如何选择合适的推理引擎">Q2: 如何选择合适的推理引擎？</h3>
<p><strong>A:</strong> 根据硬件环境和需求选择：</p>
<table>
<thead>
<tr>
<th>场景</th>
<th>推荐引擎</th>
</tr>
</thead>
<tbody>
<tr>
<td>无 GPU，有Intel CPU,追求稳定性</td>
<td>OpenVINO</td>
</tr>
<tr>
<td>有Intel GPU，需要跨平台</td>
<td>OpenVINO</td>
</tr>
<tr>
<td>无 GPU，需要跨平台</td>
<td>ONNX Runtime CPU</td>
</tr>
<tr>
<td>有 NVIDIA 显卡，快速部署</td>
<td>ONNX Runtime CUDA</td>
</tr>
<tr>
<td>有 NVIDIA 显卡，追求性能</td>
<td>ONNX Runtime TensorRT / TensorRTSharp</td>
</tr>
<tr>
<td>Windows 平台，AMD 显卡</td>
<td>ONNX Runtime DML</td>
</tr>
</tbody>
</table>
<hr>
<h3 id="q3-切换推理引擎时为什么需要重新加载模型">Q3: 切换推理引擎时为什么需要重新加载模型？</h3>
<p><strong>A:</strong> 不同推理引擎对模型格式的内部表示和优化策略不同，因此需要重新解析和加载模型。点击「加载模型」即可完成切换。</p>
<hr>
<h3 id="q4-batchsize-和并发数量有什么区别">Q4: BatchSize 和并发数量有什么区别？</h3>
<p><strong>A:</strong> 两个参数的作用不同：</p>
<ul>
<li><strong>BatchSize</strong>：单次推理处理的图片数量，提升 GPU 利用率</li>
<li><strong>并发数量</strong>：同时运行的推理引擎数量，设置几个就会生成几个推理引擎进行同时推理，提升多核/CPU 利用率</li>
</ul>
<p>调整 BatchSize 不需要重新加载模型，但调整并发数量后需要重新加载。</p>
<hr>
<h3 id="q5-tensorrt-模型转换失败怎么办">Q5: TensorRT 模型转换失败怎么办？</h3>
<p><strong>A:</strong> 检查以下几点：</p>
<ol>
<li>确保 CUDA 版本与 TensorRT 版本匹配</li>
<li>检查 ONNX 模型文件是否完整</li>
<li>确认 <code>trtexec</code> 参数中输入尺寸范围合理</li>
<li>如显存不足，减小 <code>--memPoolSize</code> 参数</li>
</ol>
<hr>
<h3 id="q6-推理结果为空或识别不准确怎么办">Q6: 推理结果为空或识别不准确怎么办？</h3>
<p><strong>A:</strong> 常见原因和解决方法：</p>
<ol>
<li><strong>图片质量</strong>：检查图片是否模糊、倾斜或光照不足</li>
<li><strong>输入尺寸</strong>：确保图片尺寸符合模型输入要求</li>
<li><strong>语言支持</strong>：确认模型是否支持目标语言</li>
<li><strong>模型版本</strong>：尝试使用不同版本的 PaddleOCR 模型</li>
</ol>
<hr>
<h2 id="八软件获取">八、软件获取</h2>
<h3 id="81-源码下载">8.1 源码下载</h3>
<p>DeploySharp 项目已完全开源，可通过以下方式获取：</p>
<p><strong>主仓库：</strong></p>
<pre><code>https://github.com/guojin-yan/DeploySharp.git
</code></pre>
<p><strong>PaddleOCR 演示程序：</strong></p>
<pre><code>https://github.com/guojin-yan/DeploySharp/tree/DeploySharpV1.0/applications/JYPPX.DeploySharp.OpenCvSharp.PaddleOcr
</code></pre>
<h3 id="82-可执行程序">8.2 可执行程序</h3>
<p>如需直接获取编译好的可执行程序，请加入技术交流群，从群文件下载最新版本。</p>
<hr>
<h2 id="九技术支持">九、技术支持</h2>
<h3 id="91-反馈与交流">9.1 反馈与交流</h3>
<ul>
<li><strong>GitHub Issues</strong>：在项目仓库提交 Issue 或 Pull Request</li>
<li><strong>QQ 交流群</strong>：加入 <strong>945057948</strong>，获取实时技术支持</li>
</ul>
<p><img src="https://img2024.cnblogs.com/blog/2933426/202601/2933426-20260128223813606-1067743997.png"></p>
<h3 id="92-相关资源">9.2 相关资源</h3>
<ul>
<li><strong>PaddleOCR 官方项目</strong>：https://github.com/PaddlePaddle/PaddleOCR</li>
<li><strong>OpenVINO 官方文档</strong>：https://docs.openvino.ai/</li>
<li><strong>TensorRT 官方文档</strong>：https://docs.nvidia.com/deeplearning/tensorrt/</li>
<li><strong>ONNX Runtime 官方文档</strong>：https://onnxruntime.ai/docs/</li>
</ul>
<hr>
<h2 id="结语">结语</h2>
<p>通过 DeploySharp 框架，我们成功实现了 PaddleOCR 在 .NET 环境下的高效部署。无论是纯 CPU 环境下的稳定运行，还是 GPU 加速下的极致性能，开发者都可以根据实际需求灵活选择。</p>
<p>未来，我们将持续优化框架性能，支持更多模型类型和推理引擎，为 .NET 开发者提供更完善的 AI 模型部署解决方案。</p>
<hr>
<p><em>作者：Guojin Yan</em><br>
<em>最后更新：2026年1月</em></p>
<hr>
<p><strong>【文章声明】</strong></p>
<p>本文主要内容基于作者的研究与实践，部分表述借助 AI 工具进行了辅助优化。由于技术局限性，文中可能存在错误或疏漏之处，恳请各位读者批评指正。如果内容无意中侵犯了您的权益，请及时通过公众号后台与我们联系，我们将第一时间核实并妥善处理。感谢您的理解与支持！</p><br><br>
来源：https://www.cnblogs.com/guojin-blogs/p/19545866

頁: [1]

圆梦公社's Archiver

使用 JYPPX.DeploySharp 高效部署 PaddleOCR，解锁多种高性能 OCR 文字识别方案