Python用python-docx读写word文档
<p>python-docx库可用于创建和编辑Microsoft Word(.docx)文件。<br>官方文档:https://python-docx.readthedocs.io/en/latest/index.html</p><p>备注:<br>doc是微软的专有的文件格式,docx是Microsoft Office2007之后版本使用,其基于Office Open XML标准的压缩文件格式,比
<br>doc文件所占用空间更小。docx格式的文件本质上是一个ZIP文件,所以其实也可以把.docx文件直接改成.zip,解压后,里面的
<br>word/document.xml包含了Word文档的大部分内容,图片文件则保存在word/media里面。<br>python-docx不支持.doc文件,间接解决方法是在代码里面先把.doc转为.docx。</p>
<p><br><strong>一、安装包 </strong></p>
<div class="cnblogs_code">
<pre>pip3 install python-docx</pre>
</div>
<p><strong>二、创建word文档</strong></p>
<p>下面是在官文示例基础上对个别地方稍微修改,并加上函数的使用说明</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">from</span> docx <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Document
</span><span style="color: rgba(0, 0, 255, 1)">from</span> docx.shared <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Inches
document </span>=<span style="color: rgba(0, 0, 0, 1)"> Document()
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加标题,并设置级别,范围:0 至 9,默认为1</span>
document.add_heading(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Document Title</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">, 0)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加段落,文本可以包含制表符(\t)、换行符(\n)或回车符(\r)等</span>
p = document.add_paragraph(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">A plain paragraph having some </span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">在段落后面追加文本,并可设置样式</span>
p.add_run(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">bold</span><span style="color: rgba(128, 0, 0, 1)">'</span>).bold =<span style="color: rgba(0, 0, 0, 1)"> True
p.add_run(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"> and some </span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
p.add_run(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">italic.</span><span style="color: rgba(128, 0, 0, 1)">'</span>).italic =<span style="color: rgba(0, 0, 0, 1)"> True
document.add_heading(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Heading, level 1</span><span style="color: rgba(128, 0, 0, 1)">'</span>, level=1<span style="color: rgba(0, 0, 0, 1)">)
document.add_paragraph(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Intense quote</span><span style="color: rgba(128, 0, 0, 1)">'</span>, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Intense Quote</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加项目列表(前面一个小圆点)</span>
<span style="color: rgba(0, 0, 0, 1)">document.add_paragraph(
</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">first item in unordered list</span><span style="color: rgba(128, 0, 0, 1)">'</span>, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">List Bullet</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
)
document.add_paragraph(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">second item in unordered list</span><span style="color: rgba(128, 0, 0, 1)">'</span>, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">List Bullet</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加项目列表(前面数字)</span>
document.add_paragraph(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">first item in ordered list</span><span style="color: rgba(128, 0, 0, 1)">'</span>, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">List Number</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
document.add_paragraph(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">second item in ordered list</span><span style="color: rgba(128, 0, 0, 1)">'</span>, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">List Number</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加图片</span>
document.add_picture(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">monty-truth.png</span><span style="color: rgba(128, 0, 0, 1)">'</span>, width=Inches(1.25<span style="color: rgba(0, 0, 0, 1)">))
records </span>=<span style="color: rgba(0, 0, 0, 1)"> (
(</span>3, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">101</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Spam</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">),
(</span>7, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">422</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Eggs</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">),
(</span>4, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">631</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Spam, spam, eggs, and spam</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">添加表格:一行三列</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> 表格样式参数可选:</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> Normal Table</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> Table Grid</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> Light Shading、 Light Shading Accent 1 至 Light Shading Accent 6</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> Light List、Light List Accent 1 至 Light List Accent 6</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> Light Grid、Light Grid Accent 1 至 Light Grid Accent 6</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> 太多了其它省略...</span>
table = document.add_table(rows=1, cols=3, style=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Light Shading Accent 2</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">获取第一行的单元格列表</span>
hdr_cells =<span style="color: rgba(0, 0, 0, 1)"> table.rows.cells
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">下面三行设置上面第一行的三个单元格的文本值</span>
hdr_cells.text = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Qty</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
hdr_cells[</span>1].text = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Id</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
hdr_cells[</span>2].text = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Desc</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> qty, id, desc <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> records:
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">表格添加行,并返回行所在的单元格列表</span>
row_cells =<span style="color: rgba(0, 0, 0, 1)"> table.add_row().cells
row_cells.text </span>=<span style="color: rgba(0, 0, 0, 1)"> str(qty)
row_cells[</span>1].text =<span style="color: rgba(0, 0, 0, 1)"> id
row_cells[</span>2].text =<span style="color: rgba(0, 0, 0, 1)"> desc
document.add_page_break()
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">保存.docx文档</span>
document.save(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">demo.docx</span><span style="color: rgba(128, 0, 0, 1)">'</span>)</pre>
</div>
<p>创建的demo.docx内容如下:</p>
<p><img src="https://img2018.cnblogs.com/blog/201408/201908/201408-20190825124558801-561386196.png" alt=""></p>
<p><strong>三、读取word文档</strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">from</span> docx <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Document
doc </span>= Document(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">demo.docx</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">每一段的内容</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> para <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> doc.paragraphs:
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(para.text)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">每一段的编号、内容</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> range(len(doc.paragraphs)):
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(str(i),doc.paragraphs.text)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">表格</span>
tbs =<span style="color: rgba(0, 0, 0, 1)"> doc.tables
</span><span style="color: rgba(0, 0, 255, 1)">for</span> tb <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> tbs:
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">行</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> row <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> tb.rows:
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">列 </span>
<span style="color: rgba(0, 0, 255, 1)">for</span> cell <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> row.cells:
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(cell.text)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">也可以用下面方法</span>
<span style="color: rgba(128, 0, 0, 1)">'''</span><span style="color: rgba(128, 0, 0, 1)">text = ''
for p in cell.paragraphs:
text += p.text
print(text)</span><span style="color: rgba(128, 0, 0, 1)">'''</span></pre>
</div>
<p>运行结果:</p>
<div class="cnblogs_code">
<pre>Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list
0 Document Title
1 A plain paragraph having some bold and some italic.
2 Heading, level 1
3 Intense quote
4 first item in unordered list
5 second item in unordered list
6 first item in ordered list
7 second item in ordered list
8
9
Qty
Id
Desc
3
101
Spam
7
422
Eggs
4
631
Spam, spam, eggs, and spam
</pre>
</div>
<p> </p><br><br>
来源:https://www.cnblogs.com/gdjlc/p/11407587.html
頁:
[1]