从 Word 文档中以 .docx 格式读取数据作为每个字段并将其保存在 Java 中的数据库中

Question

它是否能够从.docx 文件中读取数据作为一个字段，以便能够保存在数据库中？需要使用 Java。例如，我们有像 CV 这样的 Word 表单文档，我们应该读取每个字段，例如（姓名、姓氏、年龄、职位、日期），以便它能够将其保存在数据库中，而不是在一个大文本列中，而是作为一个单独的字段. Java 中存在 2 个库，其中一个是 Apache POI，另一个是 docx4j，但它提供了一种将数据保存在数据库中一个文本字段中的一大块的方法。但它应该将每个字段分隔为一个元素。

我做到了，数据保存在一大块。由于结果数据仅以这种方式保存

我还没有找到任何方法。请你有什么建议吗？

Answer 1

您需要使用您提供的输入示例解析 Microsoft Word 文档，并为每一行获取特定值。

首先，这是我使用的测试文件的格式，我把它放在我的本地目录中，它遵循与您的示例图像相同的格式：

Employee

Name: Bob

Surname: Smith

Age: 28

Position: Developer

Date: 6/26/18

import java.io.File;
import java.io.FileInputStream;
import java.util.LinkedList;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

    public class Test {

    public static void main(String[] args) {
        //exampleFile is the layout file you provided with data added for testing
        List<String> values = parseWordDocument("exampleFile.docx");
        
        for(String s: values)
            System.out.println(s);
    }
    
    public static List<String> parseWordDocument(String documentPath) {
        FileInputStream fInput = null;
        XWPFDocument document = null;
        List<String> parsedValues = null;
        
        try {
            File file = new File(documentPath);
            
            fInput = new FileInputStream(file.getAbsolutePath());
            document = new XWPFDocument(fInput);
            
            //getParagraphs() will grab each paragraph for you
            List<XWPFParagraph> paragraphs = document.getParagraphs();

            parsedValues = new LinkedList<>();
           
            for (XWPFParagraph para : paragraphs) {
                //remove the title
                if(!para.getText().equals("Employee")) {
                    //here is where you want to parse your line to get needed values
                    String[] splitLine = para.getText().split(":");
                    //based on example input file [1] is the value you need
                    parsedValues.add(splitLine[1]);
                }
            }
            
            fInput.close();
            document.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return parsedValues;
    }

}

有了这个，我从 parseWordDocument() 创建的列表中得到的输出是：

Bob

Smith

28

Developer

6/26/18

所以现在您可以简单地获取返回的列表并对其进行循环（而不是打印出值）并创建适当的 SQLite 查询。

从 Word 文档中以 .docx 格式读取数据作为每个字段并将其保存在 Java 中的数据库中

Read data from the Word document in .docx format as each field and save it in database in Java

java

apache-poi

docx4j