PHP读取word docx文档内容及处理图片
<h2>PHP读取word文档里的文字及图片,并保存</h2><p>一、composer安装phpWord</p>
<div class="cnblogs_code">
<pre>composer <span style="color: rgba(0, 0, 255, 1)">require</span> phpoffice/phpword</pre>
</div>
<p>传送门:https://packagist.org/packages/phpoffice/phpword</p>
<p> </p>
<p>二、phpWord 读取 docx 文档(<span style="color: rgba(255, 0, 0, 1)">注意是docx格式,doc格式不行</span>)</p>
<p>如果你的文件是doc格式,直接另存为一个docx就行了;如果你的doc文档较多,可以下一个批量转换工具:http://www.batchwork.com/en/doc2doc/download.htm</p>
<p>如果你还没配置自动加载,则先配置一下:</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">require</span> './vendor/autoload.php';</pre>
</div>
<p>加载文档:</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(128, 0, 128, 1)">$dir</span> = <span style="color: rgba(0, 128, 128, 1)">str_replace</span>('\\', '/', __DIR__) . '/'<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$source</span> = <span style="color: rgba(128, 0, 128, 1)">$dir</span> . 'test.docx'<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$phpWord</span> = \PhpOffice\PhpWord\IOFactory::load(<span style="color: rgba(128, 0, 128, 1)">$source</span>);</pre>
</div>
<p> </p>
<p>三、关键点</p>
<p>1)对齐方式:PhpOffice\PhpWord\Style\Paragraph -> getAlignment()</p>
<p>2)字体名称:\PhpOffice\PhpWord\Style\Font -> getName()</p>
<p>3)字体大小:\PhpOffice\PhpWord\Style\Font -> getSize()</p>
<p>4)是否加粗:\PhpOffice\PhpWord\Style\Font -> isBold()</p>
<p>5)读取图片:<span style="color: rgba(255, 0, 0, 1)">\PhpOffice\PhpWord\Element\Image -> getImageStringData()</span></p>
<p>6)ba64格式图片数据保存为图片:<span style="color: rgba(255, 0, 0, 1)">file_put_contents($imageSrc, base64_decode($imageData))</span></p>
<p> </p>
<p>四、完整代码</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">require</span> './vendor/autoload.php'<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 0, 255, 1)">function</span> docx2html(<span style="color: rgba(128, 0, 128, 1)">$source</span><span style="color: rgba(0, 0, 0, 1)">)
{
</span><span style="color: rgba(128, 0, 128, 1)">$phpWord</span> = \PhpOffice\PhpWord\IOFactory::load(<span style="color: rgba(128, 0, 128, 1)">$source</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> = ''<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(128, 0, 128, 1)">$phpWord</span>->getSections() <span style="color: rgba(0, 0, 255, 1)">as</span> <span style="color: rgba(128, 0, 128, 1)">$section</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(128, 0, 128, 1)">$section</span>->getElements() <span style="color: rgba(0, 0, 255, 1)">as</span> <span style="color: rgba(128, 0, 128, 1)">$ele1</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(128, 0, 128, 1)">$paragraphStyle</span> = <span style="color: rgba(128, 0, 128, 1)">$ele1</span>-><span style="color: rgba(0, 0, 0, 1)">getParagraphStyle();
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (<span style="color: rgba(128, 0, 128, 1)">$paragraphStyle</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> .= '<p style="text-align:'. <span style="color: rgba(128, 0, 128, 1)">$paragraphStyle</span>->getAlignment() .';text-indent:20px;">'<span style="color: rgba(0, 0, 0, 1)">;
} </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)"> {
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> .= '<p>'<span style="color: rgba(0, 0, 0, 1)">;
}
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (<span style="color: rgba(128, 0, 128, 1)">$ele1</span><span style="color: rgba(0, 0, 0, 1)"> instanceof \PhpOffice\PhpWord\Element\TextRun) {
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(128, 0, 128, 1)">$ele1</span>->getElements() <span style="color: rgba(0, 0, 255, 1)">as</span> <span style="color: rgba(128, 0, 128, 1)">$ele2</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (<span style="color: rgba(128, 0, 128, 1)">$ele2</span><span style="color: rgba(0, 0, 0, 1)"> instanceof \PhpOffice\PhpWord\Element\Text) {
</span><span style="color: rgba(128, 0, 128, 1)">$style</span> = <span style="color: rgba(128, 0, 128, 1)">$ele2</span>-><span style="color: rgba(0, 0, 0, 1)">getFontStyle();
</span><span style="color: rgba(128, 0, 128, 1)">$fontFamily</span> = mb_convert_encoding(<span style="color: rgba(128, 0, 128, 1)">$style</span>->getName(), 'GBK', 'UTF-8'<span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$fontSize</span> = <span style="color: rgba(128, 0, 128, 1)">$style</span>-><span style="color: rgba(0, 0, 0, 1)">getSize();
</span><span style="color: rgba(128, 0, 128, 1)">$isBold</span> = <span style="color: rgba(128, 0, 128, 1)">$style</span>-><span style="color: rgba(0, 0, 0, 1)">isBold();
</span><span style="color: rgba(128, 0, 128, 1)">$styleString</span> = ''<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$fontFamily</span> && <span style="color: rgba(128, 0, 128, 1)">$styleString</span> .= "font-family:{<span style="color: rgba(128, 0, 128, 1)">$fontFamily</span>};"<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$fontSize</span> && <span style="color: rgba(128, 0, 128, 1)">$styleString</span> .= "font-size:{<span style="color: rgba(128, 0, 128, 1)">$fontSize</span>}px;"<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$isBold</span> && <span style="color: rgba(128, 0, 128, 1)">$styleString</span> .= "font-weight:bold;"<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> .= <span style="color: rgba(0, 128, 128, 1)">sprintf</span>('<span style="%s">%s</span>',
<span style="color: rgba(128, 0, 128, 1)">$styleString</span>,<span style="color: rgba(0, 0, 0, 1)">
mb_convert_encoding(</span><span style="color: rgba(128, 0, 128, 1)">$ele2</span>->getText(), 'GBK', 'UTF-8'<span style="color: rgba(0, 0, 0, 1)">)
);
} </span><span style="color: rgba(0, 0, 255, 1)">elseif</span> (<span style="color: rgba(128, 0, 128, 1)">$ele2</span><span style="color: rgba(0, 0, 0, 1)"> instanceof \PhpOffice\PhpWord\Element\Image) {
</span><span style="color: rgba(128, 0, 128, 1)">$imageSrc</span> = 'images/' . <span style="color: rgba(0, 128, 128, 1)">md5</span>(<span style="color: rgba(128, 0, 128, 1)">$ele2</span>->getSource()) . '.' . <span style="color: rgba(128, 0, 128, 1)">$ele2</span>-><span style="color: rgba(0, 0, 0, 1)">getImageExtension();
</span><span style="color: rgba(128, 0, 128, 1)">$imageData</span> = <span style="color: rgba(128, 0, 128, 1)">$ele2</span>->getImageStringData(<span style="color: rgba(255, 0, 0, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> $imageData = 'data:' . $ele2->getImageType() . ';base64,' . $imageData;</span>
<span style="color: rgba(0, 128, 128, 1)">file_put_contents</span>(<span style="color: rgba(128, 0, 128, 1)">$imageSrc</span>, <span style="color: rgba(0, 128, 128, 1)">base64_decode</span>(<span style="color: rgba(128, 0, 128, 1)">$imageData</span><span style="color: rgba(0, 0, 0, 1)">));
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> .= '<img src="'. <span style="color: rgba(128, 0, 128, 1)">$imageSrc</span> .'" style="width:100%;height:auto">'<span style="color: rgba(0, 0, 0, 1)">;
}
}
}
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> .= '</p>'<span style="color: rgba(0, 0, 0, 1)">;
}
}
</span><span style="color: rgba(0, 0, 255, 1)">return</span> mb_convert_encoding(<span style="color: rgba(128, 0, 128, 1)">$html</span>, 'UTF-8', 'GBK'<span style="color: rgba(0, 0, 0, 1)">);
}
</span><span style="color: rgba(128, 0, 128, 1)">$dir</span> = <span style="color: rgba(0, 128, 128, 1)">str_replace</span>('\\', '/', __DIR__) . '/'<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$source</span> = <span style="color: rgba(128, 0, 128, 1)">$dir</span> . 'test.docx'<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 0, 255, 1)">echo</span> docx2html(<span style="color: rgba(128, 0, 128, 1)">$source</span>);</pre>
</div>
<p> </p>
<p>五、补充</p>
<p>很明显,这是一个简陋的word读取示例,只读取了段落的对齐方式,文字的字体、大小、是否加粗及图片等信息,其他例如文字颜色、行高。。。等等信息都忽悠了。需要的话,请自行查看phpWord源码,看\PhpOffice\PhpWord\Style\xxx 和 \PhpOffice\PhpWord\Element\xxx 等类里有什么读取方法就可以了</p>
<p> </p>
<p>六、2020-07-21 补充</p>
<p>可以用以下方法直接获取到完整的html</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(128, 0, 128, 1)">$phpWord</span> = \PhpOffice\PhpWord\IOFactory::load('xxx.docx'<span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$xmlWriter</span> = \PhpOffice\PhpWord\IOFactory::createWriter(<span style="color: rgba(128, 0, 128, 1)">$phpWord</span>, "HTML"<span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$html</span> = <span style="color: rgba(128, 0, 128, 1)">$xmlWriter</span>->getContent();</pre>
</div>
<p>注:html内容里包含了head部分,如果只需要style和body的话,需要自己处理一下;然后图片是base64的,要保存的话,也需要自己处理一下</p>
<p>base64数据保存为图片请参考上面代码</p>
<p> </p>
<p>如果只想获取body里的内容,可以参考 \PhpOffice\PhpWord\Writer\HTML\Part\Body 里的 write 方法</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(128, 0, 128, 1)">$phpWord</span> = \PhpOffice\PhpWord\IOFactory::load('xxxx.docx'<span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$htmlWriter</span> = \PhpOffice\PhpWord\IOFactory::createWriter(<span style="color: rgba(128, 0, 128, 1)">$phpWord</span>, "HTML"<span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> = ''<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(128, 0, 128, 1)">$phpWord</span>->getSections() <span style="color: rgba(0, 0, 255, 1)">as</span> <span style="color: rgba(128, 0, 128, 1)">$section</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(128, 0, 128, 1)">$writer</span> = <span style="color: rgba(0, 0, 255, 1)">new</span> \PhpOffice\PhpWord\Writer\HTML\Element\Container(<span style="color: rgba(128, 0, 128, 1)">$htmlWriter</span>, <span style="color: rgba(128, 0, 128, 1)">$section</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> .= <span style="color: rgba(128, 0, 128, 1)">$writer</span>-><span style="color: rgba(0, 0, 0, 1)">write();
}
</span><span style="color: rgba(0, 0, 255, 1)">echo</span> <span style="color: rgba(128, 0, 128, 1)">$content</span>;<span style="color: rgba(0, 0, 255, 1)">exit</span>;</pre>
</div>
<p> </p>
<p>图片的处理的话,暂时没有好办法能在不修改源码的情况下处理好,改源码的话,相关代码在 \PhpOffice\PhpWord\Writer\HTML\Element\Image 里</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">function</span><span style="color: rgba(0, 0, 0, 1)"> write()
{
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(128, 0, 128, 1)">$this</span>-><span style="color: rgba(0, 0, 0, 1)">element instanceof ImageElement) {
</span><span style="color: rgba(0, 0, 255, 1)">return</span> ''<span style="color: rgba(0, 0, 0, 1)">;
}
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> = ''<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$imageData</span> = <span style="color: rgba(128, 0, 128, 1)">$this</span>->element->getImageStringData(<span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (<span style="color: rgba(128, 0, 128, 1)">$imageData</span> !== <span style="color: rgba(0, 0, 255, 1)">null</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(128, 0, 128, 1)">$styleWriter</span> = <span style="color: rgba(0, 0, 255, 1)">new</span> ImageStyleWriter(<span style="color: rgba(128, 0, 128, 1)">$this</span>->element-><span style="color: rgba(0, 0, 0, 1)">getStyle());
</span><span style="color: rgba(128, 0, 128, 1)">$style</span> = <span style="color: rgba(128, 0, 128, 1)">$styleWriter</span>-><span style="color: rgba(0, 0, 0, 1)">write();
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> $imageData = 'data:' . $this->element->getImageType() . ';base64,' . $imageData;</span>
<span style="color: rgba(128, 0, 128, 1)">$imageSrc</span> = 'images/' . <span style="color: rgba(0, 128, 128, 1)">md5</span>(<span style="color: rgba(128, 0, 128, 1)">$this</span>->element->getSource()) . '.' . <span style="color: rgba(128, 0, 128, 1)">$this</span>->element-><span style="color: rgba(0, 0, 0, 1)">getImageExtension();
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 这里可以自己处理,上传oss之类的</span>
<span style="color: rgba(0, 128, 128, 1)">file_put_contents</span>(<span style="color: rgba(128, 0, 128, 1)">$imageSrc</span>, <span style="color: rgba(0, 128, 128, 1)">base64_decode</span>(<span style="color: rgba(128, 0, 128, 1)">$imageData</span><span style="color: rgba(0, 0, 0, 1)">));
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> .= <span style="color: rgba(128, 0, 128, 1)">$this</span>-><span style="color: rgba(0, 0, 0, 1)">writeOpening();
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> .= "<img border=\"0\" style=\"{<span style="color: rgba(128, 0, 128, 1)">$style</span>}\" src=\"{<span style="color: rgba(128, 0, 128, 1)">$imageSrc</span>}\"/>"<span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(128, 0, 128, 1)">$content</span> .= <span style="color: rgba(128, 0, 128, 1)">$this</span>-><span style="color: rgba(0, 0, 0, 1)">writeClosing();
}
</span><span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(128, 0, 128, 1)">$content</span><span style="color: rgba(0, 0, 0, 1)">;
}</span></pre>
</div>
<p> </p>
<hr>
<p> </p>
<p>完。</p><br><br>
来源:https://www.cnblogs.com/tujia/p/12133615.html
頁:
[1]