跨平台Windows和Linux(银河麒麟)操作系统OCR识别应用
<p align="left"><span style="font-size: 16px"><strong>1 </strong><strong>运行效果</strong></span></p><p align="left"> <span style="font-size: 16px"> 代码下载链接: https://pan.baidu.com/s/1NUfLTjk6kzXJKsaH7yo4qA?pwd=rk5c 提取码: rk5c。</span></p>
<p align="left"><span style="font-size: 16px"> 在银河麒麟桌面操作系统V10(SP1)上运行OCR识别效果如下图:</span></p>
<p><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115433040-1364257055.png" alt="" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-size: 16px"><strong>2 </strong><strong>在Linux上安装Tesseract OCR引擎</strong></span></p>
<p align="left"><span style="font-size: 16px"><strong>2.1 </strong><strong>下载tesseract-ocr和leptonica</strong></span></p>
<div class="cnblogs_code">
<pre>https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">codeload.github.com/tesseract-ocr/tesseract/tar.gz/5.2.0</span>
http:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">www.leptonica.org/source/leptonica-1.82.0.tar.gz</span></pre>
</div>
<p align="left"><span style="font-size: 16px">以上是在浏览器上下载,用linux的wget方式下载</span></p>
<div class="cnblogs_code">
<pre>wget https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">github.com/tesseract-ocr/tesseract/archive/5.2.0.tar.gz</span>
wget http:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">www.leptonica.org/source/leptonica-1.82.0.tar.gz</span></pre>
</div>
<p align="left"><span style="font-size: 16px">注意版本号:使用的是tesseract.5.2.0 和 leptonica-1.82.0</span></p>
<p align="left"><span style="font-size: 16px">下载好之后,上传到linux服务器上的新目录中,比如:/home/wxzz</span></p>
<p align="left"><span style="font-size: 16px"><strong>2.2 </strong><strong>安装</strong></span></p>
<p align="left"><span style="font-size: 16px">依次执行以下命令</span></p>
<div class="cnblogs_code">
<pre>cd /home/<span style="color: rgba(0, 0, 0, 1)">wxzz
tar </span>-xvf leptonica-<span style="color: rgba(128, 0, 128, 1)">1.82</span>.<span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 0, 0, 1)">.tar.gz
cd leptonica</span>-<span style="color: rgba(128, 0, 128, 1)">1.82</span>.<span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 0, 0, 1)">
.</span>/<span style="color: rgba(0, 0, 0, 1)">configure
make
make install
apt installautomake
apt installlibtool
tar </span>-xvf tesseract-<span style="color: rgba(128, 0, 128, 1)">5.2</span>.<span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 0, 0, 1)">.tar.gz
cd tesseract</span>-<span style="color: rgba(128, 0, 128, 1)">5.2</span>.<span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 0, 0, 1)">
.</span>/<span style="color: rgba(0, 0, 0, 1)">autogen.sh
.</span>/<span style="color: rgba(0, 0, 0, 1)">configure
make
make install
sudo ldconfig</span></pre>
</div>
<p align="left"><span style="font-size: 16px"><strong>2.3 </strong><strong>配置环境变量</strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">/*</span><span style="color: rgba(0, 128, 0, 1)">打开文件</span><span style="color: rgba(0, 128, 0, 1)">*/</span><span style="color: rgba(0, 0, 0, 1)">
vim </span>/etc/<span style="color: rgba(0, 0, 0, 1)">profile
</span><span style="color: rgba(0, 128, 0, 1)">/*</span><span style="color: rgba(0, 128, 0, 1)">在文件末尾添加</span><span style="color: rgba(0, 128, 0, 1)">*/</span><span style="color: rgba(0, 0, 0, 1)">
export LD_LIBRARY_PATH</span>=/usr/local/<span style="color: rgba(0, 0, 0, 1)">lib
export LIBLEPT_HEADERSDIR</span>=/usr/local/<span style="color: rgba(0, 0, 0, 1)">include
export PKG_CONFIG_PATH</span>=/usr/local/lib/<span style="color: rgba(0, 0, 0, 1)">pkgconfig
export TESSDATA_PREFIX</span>=/usr/local/share/<span style="color: rgba(0, 0, 0, 1)">tessdata
</span><span style="color: rgba(0, 128, 0, 1)">/*</span><span style="color: rgba(0, 128, 0, 1)">立即生效</span><span style="color: rgba(0, 128, 0, 1)">*/</span><span style="color: rgba(0, 0, 0, 1)">
source </span>/etc/profile</pre>
</div>
<p align="left"><span style="font-size: 16px"><strong>2.4 </strong><strong>下载语言包</strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">中文简体</span>
https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/chi_sim.traineddata
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">英文</span>
https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/eng.traineddata</span></pre>
</div>
<p align="left"><span style="font-size: 16px">目前最新版本是 4.00,将下载好的语言包上传到linux服务器指定目录中:/usr/local/share/tessdata</span></p>
<p align="left"><span style="font-size: 16px"><strong>2.5 </strong><strong>测试是否安装成功</strong></span></p>
<div class="cnblogs_code">
<pre>tesseract --version</pre>
</div>
<p align="left"><span style="font-size: 16px">如果安装成功,如下图:</span></p>
<p><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115014650-1916935590.png" alt="" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>
<p align="left"><span style="font-size: 16px"><strong>2.6 </strong><strong>测试读取图片内容</strong></span></p>
<div class="cnblogs_code">
<pre>tesseract ocr.png output -l chi_sim</pre>
</div>
<p align="left"><span style="font-size: 16px">可能会报错提示,看一下实际有没有文件输出。参数说明:</span></p>
<p align="left"><span style="font-size: 16px">ocr.png : 是要识别的图片文件</span></p>
<p align="left"><span style="font-size: 16px">output : 是识别后的文本(output.txt)</span></p>
<p align="left"><span style="font-size: 16px">chi_sim : 用到的语言包</span></p>
<p align="left"><span style="font-size: 16px"><strong>3 </strong><strong>部署项目</strong></span></p>
<p align="left"><span style="font-size: 16px"><strong>3.1 </strong><strong>添加引用</strong></span></p>
<p align="left"><span style="font-size: 16px">新建一个NET6的项目工程,在nuget 里面 查找tesseract,添加到项目中,版本是5.2.0,如下图:</span></p>
<p align="left"><span style="font-size: 16px"><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115204991-1706665655.png" alt="" style="display: block; margin-left: auto; margin-right: auto"></span></p>
<p align="left"><span style="font-size: 16px"><strong>3.2 </strong><strong>读取图片文字</strong></span></p>
<p align="left"><span style="font-size: 16px"> C#实现的代码如:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">using</span><span style="color: rgba(0, 0, 0, 1)"> Tesseract;
</span><span style="color: rgba(0, 0, 255, 1)">namespace</span><span style="color: rgba(0, 0, 0, 1)"> LinuxOCR
{
</span><span style="color: rgba(0, 0, 255, 1)">internal</span> <span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Program
{
</span><span style="color: rgba(0, 0, 255, 1)">static</span> <span style="color: rgba(0, 0, 255, 1)">string</span> testImagePath = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">ocr.png</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 0, 255, 1)">static</span> <span style="color: rgba(0, 0, 255, 1)">void</span> Main(<span style="color: rgba(0, 0, 255, 1)">string</span><span style="color: rgba(0, 0, 0, 1)">[] args)
{
</span><span style="color: rgba(0, 0, 255, 1)">string</span> textResult =<span style="color: rgba(0, 0, 0, 1)"> String.Empty;
</span><span style="color: rgba(0, 0, 255, 1)">using</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> engine = <span style="color: rgba(0, 0, 255, 1)">new</span> TesseractEngine(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">tessdata</span><span style="color: rgba(128, 0, 0, 1)">"</span>, <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">eng</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">, EngineMode.Default))
{
</span><span style="color: rgba(0, 0, 255, 1)">using</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> img =<span style="color: rgba(0, 0, 0, 1)"> Pix.LoadFromFile(testImagePath))
{
</span><span style="color: rgba(0, 0, 255, 1)">using</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> page =<span style="color: rgba(0, 0, 0, 1)"> engine.Process(img))
{
textResult</span>=<span style="color: rgba(0, 0, 0, 1)"> page.GetText();
}
}
}
Console.WriteLine(</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">识别结果:</span><span style="color: rgba(128, 0, 0, 1)">"</span>+<span style="color: rgba(0, 0, 0, 1)">textResult);
Console.ReadLine();
}
}
}</span></pre>
</div>
<p align="left"><span style="font-size: 16px">项目工程目录结构,如下图:</span></p>
<p align="center"><span style="font-size: 16px"><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115308743-1685850695.png" alt=""></span></p>
<p align="left"><span style="font-size: 16px">注意:在bin\Debug\net6.0目录下有一个tessdata目录,其中的文件,来源于第2.4步骤的下载。</span></p>
<p align="left"><span style="font-size: 16px"><strong>3.3 </strong><strong>补齐linux上需要的文件</strong></span></p>
<p align="left"><span style="font-size: 16px">项目部署到linux上后,还需要在x64目录中增加两个文件:libleptonica-1.82.0.so和libtesseract50.so,把这两个文件需要从linux服务器上的文件路径复制到自己的工程中:/usr/lib/x86_64-linux-gnu/libleptonica.so 和 /usr/local/lib/libtesseract.so,并且把文件名分别改为:libleptonica-1.82.0.so和libtesseract50.so。发布后的x64目录中,然后改名如下图:<br><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115334643-54265401.png" alt="" style="display: block; margin-left: auto; margin-right: auto"></span></p>
<p align="left"><span style="font-size: 16px"><strong>4.</strong><strong>运行</strong></span></p>
<p align="left"><span style="font-size: 16px">工程发布到publish目录后,在麒麟操作上运行dotnet LinuxOCR.dll,效果如下图:<br><img src="https://img2024.cnblogs.com/blog/279374/202503/279374-20250303115403153-974530853.png" alt="" style="display: block; margin-left: auto; margin-right: auto"></span></p>
<hr>
<p>物联网&大数据技术 QQ群:54256083</p>
<p align="left">物联网&大数据项目 QQ群:727664080</p>
<p align="left">QQ:504547114</p>
<p align="left">微信:wxzz0151</p>
<p align="left">博客:https://www.cnblogs.com/lsjwq</p>
<p align="left">微信公众号:iNeuOS</p>
<p><img src="https://img2023.cnblogs.com/blog/279374/202312/279374-20231201092335752-2118744919.png" alt="" width="155" height="155"></p><br><br>
来源:https://www.cnblogs.com/lsjwq/p/18747917
頁:
[1]