Java将word转成html(图片使用base64保存)

2023-06-24 626点热度 0人点赞 0条评论

如题的需求,需要用Java将word的doc(docx)文档转成html,另外word文档中,还有一些图片,需要将图片转成base保存在html中。

需要用到的框架如下:

相关的maven依赖如下:

pom.xml

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.2.3</version>
</dependency>

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version>
</dependency>

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>5.2.3</version>
</dependency>

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document</artifactId>
    <version>2.0.1</version>
</dependency>

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
    <version>2.0.1</version>
</dependency>

直接doc转html代码如下:

public class Test {
    public static void main(String[] args) throws Exception {
        //ps:当inputStream!=null,而生成wordDocument报错,请检查文档是否用office word保存的
        HWPFDocument wordDocument = (HWPFDocument) WordToHtmlUtils.loadDoc(new File("/Users/Terry/Downloads/word.doc"));
        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()
        );
        //将图片转成base64的格式
        PicturesManager pictureRunMapper = (bytes, pictureType, s, v, v1) -> "data:image/png;base64," + Base64.encodeBase64String(bytes);
        wordToHtmlConverter.setPicturesManager(pictureRunMapper);
        //解析word文档
        wordToHtmlConverter.processDocument(wordDocument);
        Document htmlDocument = wordToHtmlConverter.getDocument();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(new FileOutputStream("/Users/Terry/Downloads/dest.html"));
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer serializer = factory.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);

    }

}

 

admin

这个人很懒,什么都没留下

文章评论

您需要 登录 之后才可以评论