Java实现Word转PDF的两种常见方案和性能实测
<div id="navCategory"><h5 class="catalogue">目录</h5><ul class="first_class_ul"><li>一、测试目标与原则</li><li>二、测试环境</li><li>三、项目结构</li><li>四、Maven 依赖(pom.xml)</li><li>五、LibreOffice 转 PDF 实现</li><li>六、docx4j 转 PDF 实现</li><li>七、性能统计工具(P50 / P95)</li><li>八、压测主程序(BenchMain)</li><li>九、实测结果</li><li>十、为什么差距这么大?</li><li>十一、结论</li></ul></div><p>本文对比了Java后端开发中Word转PDF的两种常见方案:docx4j(纯Java实现)和LibreOffice(调用Office原生引擎)。通过可运行的Java代码测试,分析了单次转换耗时、冷热启动时间以及稳定性等指标。测试环境采用JDK17、8核CPU、16GB内存,针对20页含图文表格的中文Word文档进行测试。项目提供了完整的Maven依赖配置和两种转换方案的实现代码,其中docx4j采用XSL-FO+Apache FOP技术路线,并详细处理了中文字体映射问题。测试结果将揭示两种方案在真实场景下的</p><ul><li><strong>docx4j(纯 Java 实现)</strong></li><li><strong>LibreOffice(调用 Office 原生引擎)</strong></li></ul>
<p>很多讨论停留在“听说 LibreOffice 更快”,但<strong>没有代码、没有数据、无法复现</strong>。</p>
<p><strong>用可运行的 Java 代码,验证 docx4j 和 LibreOffice 的真实性能差距。</strong></p>
<p class="maodian"></p><h2>一、测试目标与原则</h2>
<p><strong>测试目标</strong></p>
<ul><li>对比 <strong>单次转换耗时</strong></li><li>对比 <strong>冷启动 / 热启动</strong></li><li>对比 <strong>稳定性(失败率)</strong></li></ul>
<p><strong>测试原则</strong></p>
<ul><li>同一批 <code>.docx</code> 文件</li><li>串行执行(对比单次成本)</li><li>不引入 Spring / Web 干扰</li><li>用 <code>System.nanoTime()</code> 直接计时</li></ul>
<p class="maodian"></p><h2>二、测试环境</h2>
<ul><li>JDK:17</li><li>CPU:8 Core</li><li>内存:16 GB</li><li>OS:Windows</li><li>LibreOffice:25.x</li><li>docx4j:11.x</li><li>文档:20 页 Word(含图片、表格、页眉页脚、中文)</li></ul>
<p class="maodian"></p><h2>三、项目结构</h2>
<blockquote><p>pdf-bench/<br /> ├─ pom.xml<br /> ├─ samples/<br /> │ └─ sample.docx<br /> └─ src/main/java/com/example/bench/<br /> ├─ BenchMain.java<br /> ├─ LibreOfficeConverter.java<br /> ├─ Docx4jConverter.java<br /> └─ Stats.java</p></blockquote>
<p class="maodian"></p><h2>四、Maven 依赖(pom.xml)</h2>
<div class="jb51code"><pre class="brush:xml;"><dependencies>
<!-- docx4j -->
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-core</artifactId>
<version>11.4.8</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
<version>11.4.8</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>11.4.8</version>
</dependency>
<!-- Apache FOP -->
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>fop</artifactId>
<version>2.9</version>
</dependency>
</dependencies>
</pre></div>
<p class="maodian"></p><h2>五、LibreOffice 转 PDF 实现</h2>
<div class="jb51code"><pre class="brush:java;">public class LibreOfficeConverter {
private final String soffice;
public LibreOfficeConverter(String soffice) {
this.soffice = soffice;
}
public void convert(Path docx, Path outDir) throws Exception {
ProcessBuilder pb = new ProcessBuilder(
soffice,
"--headless",
"--nologo",
"--nolockcheck",
"--nodefault",
"--nofirststartwizard",
"--convert-to", "pdf",
"--outdir", outDir.toString(),
docx.toString()
);
pb.redirectErrorStream(true);
Process p = pb.start();
int code = p.waitFor();
if (code != 0) {
throw new RuntimeException("LibreOffice failed, exit=" + code);
}
}
}
</pre></div>
<p>Windows 记得把 <code>soffice</code> 改成<code>C:\Program Files\LibreOffice\program\soffice.exe</code></p>
<p class="maodian"></p><h2>六、docx4j 转 PDF 实现</h2>
<div class="jb51code"><pre class="brush:java;">public class Docx4jConverter {
public void convert(Path docx, Path pdf) throws Exception {
// 1️⃣ 加载 Word
WordprocessingMLPackage wordMLPackage =
WordprocessingMLPackage.load(docx.toFile());
// 2️⃣ 字体映射(核心)
IdentityPlusMapper fontMapper = new IdentityPlusMapper();
// 宋体(核心兜底)
PhysicalFont simsun = PhysicalFonts.get("SimSun");
if (simsun == null) {
throw new RuntimeException("未找到 SimSun 字体,请确认系统已安装宋体");
}
fontMapper.put("SimSun", simsun);
// ===== 常用中文字体映射 =====
fontMapper.put("隶书", PhysicalFonts.get("LiSu"));
fontMapper.put("宋体", PhysicalFonts.get("SimSun"));
fontMapper.put("微软雅黑", PhysicalFonts.get("Microsoft YaHei"));
fontMapper.put("黑体", PhysicalFonts.get("SimHei"));
fontMapper.put("楷体", PhysicalFonts.get("KaiTi"));
fontMapper.put("新宋体", PhysicalFonts.get("NSimSun"));
fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));
fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));
fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));
fontMapper.put("幼圆", PhysicalFonts.get("YouYuan"));
fontMapper.put("华文宋体", PhysicalFonts.get("STSong"));
fontMapper.put("华文中宋", PhysicalFonts.get("STZhongsong"));
fontMapper.put("等线", PhysicalFonts.get("SimSun"));
fontMapper.put("等线 Light", PhysicalFonts.get("SimSun"));
fontMapper.put("华文琥珀", PhysicalFonts.get("STHupo"));
fontMapper.put("华文隶书", PhysicalFonts.get("STLiti"));
fontMapper.put("华文新魏", PhysicalFonts.get("STXinwei"));
fontMapper.put("华文彩云", PhysicalFonts.get("STCaiyun"));
fontMapper.put("方正姚体", PhysicalFonts.get("FZYaoti"));
fontMapper.put("方正舒体", PhysicalFonts.get("FZShuTi"));
fontMapper.put("华文细黑", PhysicalFonts.get("STXihei"));
fontMapper.put("宋体扩展", PhysicalFonts.get("simsun-extB"));
fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));
fontMapper.put("新細明體", PhysicalFonts.get("SimSun"));
// ===== 修复 “宋体(正文)/ 宋体(标题)/ 台湾字体” 乱码 =====
PhysicalFonts.put("PMingLiU", simsun);
PhysicalFonts.put("新細明體", simsun);
// 3️⃣ 设置到 Word 包
wordMLPackage.setFontMapper(fontMapper);
// 4️⃣ 转 PDF
try (OutputStream os = Files.newOutputStream(pdf)) {
Docx4J.toPDF(wordMLPackage, os);
}
}
}
</pre></div>
<p>docx4j 走的是 <strong>XSL-FO + Apache FOP</strong> 路线。</p>
<p class="maodian"></p><h2>七、性能统计工具(P50 / P95)</h2>
<div class="jb51code"><pre class="brush:java;">public class Stats {
private final List<Long> times = new ArrayList<>();
public void add(long nanos) {
times.add(nanos);
}
public void print(String name) {
Collections.sort(times);
double avg = times.stream().mapToLong(x -> x).average().orElse(0) / 1e6;
double p50 = times.get(times.size() / 2) / 1e6;
double p95 = times.get((int)(times.size() * 0.95)) / 1e6;
System.out.printf(
"%s -> avg=%.2fms, p50=%.2fms, p95=%.2fms%n",
name, avg, p50, p95
);
}
}
</pre></div>
<p class="maodian"></p><h2>八、压测主程序(BenchMain)</h2>
<div class="jb51code"><pre class="brush:java;">public class BenchMain {
public static void main(String[] args) throws Exception {
Path docx = Path.of("samples/sample.docx");
Path out = Path.of("out");
Files.createDirectories(out);
LibreOfficeConverter lo =
new LibreOfficeConverter("soffice");
Docx4jConverter d4j =
new Docx4jConverter();
int rounds = 5;
// LibreOffice
Stats loStats = new Stats();
for (int i = 0; i < rounds; i++) {
long t0 = System.nanoTime();
lo.convert(docx, out);
loStats.add(System.nanoTime() - t0);
}
// docx4j
Stats d4jStats = new Stats();
for (int i = 0; i < rounds; i++) {
long t0 = System.nanoTime();
d4j.convert(docx, out.resolve("d4j_" + i + ".pdf"));
d4jStats.add(System.nanoTime() - t0);
}
loStats.print("LibreOffice");
d4jStats.print("docx4j");
}
}
</pre></div>
<p class="maodian"></p><h2>九、实测结果</h2>
<p style="text-align:center"><img alt="" src="https://img.jbzj.com/file_images/article/202601/2026011309024951.png" /></p>
<p>** LibreOffice 稳定快 **</p>
<p class="maodian"></p><h2>十、为什么差距这么大?</h2>
<p><strong>docx4j 执行路径</strong></p>
<blockquote><p>docx(xml)<br /> → JAXB<br /> → Java 对象<br /> → XSL-FO<br /> → Apache FOP 排版<br /> → PDF</p></blockquote>
<ul><li>排版在 Java 层完成</li><li>表格、分页、字体极其耗 CPU</li><li>GC 压力大</li></ul>
<p><strong>LibreOffice 执行路径</strong></p>
<blockquote><p>docx<br /> → Office 原生排版引擎<br /> → PDF</p></blockquote>
<ul><li>原生排版</li><li>无 XML 重建</li><li>几乎等同“另存为 PDF”</li></ul>
<p class="maodian"></p><h2>十一、结论</h2>
<p><strong>如果你是 Java 后端,做真实业务文档转换:</strong></p>
<ul><li>性能:LibreOffice</li><li>稳定性:LibreOffice</li><li>还原度:LibreOffice</li></ul>
<p>docx4j 只适合:</p>
<ul><li>简单模板</li><li>低 QPS</li><li>必须纯 Java 的场景</li></ul>
<p><strong>docx4j导出来可能有问题</strong></p>
<p style="text-align:center"><img alt="" src="https://img.jbzj.com/file_images/article/202601/2026011309024949.png" /></p>
<p><strong>LibreOffice 正常</strong></p>
<p style="text-align:center"><img alt="" src="https://img.jbzj.com/file_images/article/202601/2026011309024926.png" /></p>
<p>以上就是Java实现Word转PDF的两种常见方案和性能实测的详细内容,更多关于Java Word转PDF的资料请关注琼殿技术社区其它相关文章!</p>
<div class="art_xg">
<b>您可能感兴趣的文章:</b><ul><li>Java实现Word、Excel、PDF文件格式互转的几种实现方式</li><li>Java实现轻松提取word和pdf文档内容</li><li>Java实现将Word文档转换为密码保护的PDF文件</li><li>Java利用Spire.PDF实现将PDF文档转换为Word格式</li><li>Java实现将PDF转换为Word的示例详解</li><li>Java使用Spire.Doc实现Word转PDF的完整方案</li><li>Java高效实现Word转PDF的完整指南</li><li>Java实现将Doc/Docx格式的Word文档转换为PDF文件</li><li>基于Java实现将word,excel文件转换为pdf的工具类</li></ul>
</div>
</div>
<!--endmain-->
頁:
[1]