如何找到并列出 acroform .pdf 中可编辑字段的内部字段标签？

Question

这是必需的，以便它们可以用于以编程方式填写 .pdf。

我想使用 pdfBox 或 iText，但是 .pdf 很复杂，而且似乎两者都无法完成此任务。

是否存在查找并列出可编辑字段的内部字段标签的软件或代码？

非常感谢任何帮助，谢谢。

Answer 1

请参考以下代码解决

public class PDFBOX {

    public static void main(String[] args) throws IOException {
        PDDocument fdeb = null;
        File pdfFile  =  new File("C:\Users\pc\Desktop\Req-form.pdf");
        fdeb = PDDocument.load(pdfFile);
        PDDocumentCatalog pdCatalog = fdeb.getDocumentCatalog();
        PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

        // Text fields 
        PDField firstName = pdAcroForm.getField("firstName");
        firstName.setValue("firstName");

        fdeb.save("C:\Users\Desktop\Test.pdf");
        fdeb.close();

    }

}

Answer 2

不幸的是，这个问题并不完全清楚。特别是术语“内部标签”非常模糊。在此答案中，我假设他的意思是“完全限定的字段名称”，它由字段的部分字段名称及其所有祖先构成，如 PDF 规范（ISO 32000-1）第 12.7.3.2 节所述。如果这不是要求的“内部标签”，请正确定义该术语。

还不清楚代码应该如何“将它们组织成一个对象”；我假设将它们添加到列表中是合格的。

iText 7

在 iText 7.0.1 中，您可以使用以下方法检索表单字段的完全限定名称列表：

List<String> getFormFieldNames(PdfDocument pdfDocument)
{
    PdfAcroForm pdfAcroForm = PdfAcroForm.getAcroForm(pdfDocument, false);
    if (pdfAcroForm == null)
        return Collections.emptyList();

    List<String> result = new ArrayList<>(pdfAcroForm.getFormFields().keySet());
    return result;
}

(iText ShowFormFieldNames 方法 getFormFieldNames)

PDFBox 2

使用 PDFBox 2.0.3，您可以使用以下方法检索表单字段的完全限定名称列表：

List<String> getFormFieldNames(PDDocument pdDocument)
{
    PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
    if (pdAcroForm == null)
        return Collections.emptyList();

    List<String> result = new ArrayList<>();
    for (PDField pdField : pdAcroForm.getFieldTree())
    {
        if (pdField instanceof PDTerminalField)
        {
            result.add(pdField.getFullyQualifiedName());
        }
    }
    return result;
}

(PDFBox ShowFormFieldNames 方法 getFormFieldNames)

或者（需要Java 8）更花哨

List<String> getFormFieldNamesFancy(PDDocument pdDocument)
{
    PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
    if (pdAcroForm == null)
        return Collections.emptyList();

    return StreamSupport.stream(pdAcroForm.getFieldTree().spliterator(), false)
                        .filter(field -> (field instanceof PDTerminalField))
                        .map(field -> field.getFullyQualifiedName())
                        .collect(Collectors.toList());
}

(PDFBox ShowFormFieldNames 方法 getFormFieldNamesFancy)

Answer 3

iText 可用于列出 acroform .pdf 中的内部字段。这是一个命令行程序，可以用 IntelliJ 构建，可以做到这一点：

https://github.com/powerblue/use_iText_to_get_internal_fields

package fr.jp.pdf;

import java.io.IOException;
import java.net.URL;
import java.text.ParseException;
import java.util.Arrays;
import java.util.Hashtable;
import java.util.Map;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class FormParser {
    public static final String VERSION = "16.10.13.1";
    private static final Logger LOGGER = LoggerFactory.getLogger(FormParser.class);
    public static final String PARAM_KEY__GET_LIST = "-LIST_FIELDS";
    private static final String MSG_HELP = "usage:\n> java -cp $CLASSPATH fr.jp.pdf.FormParser -LIST_FIELDS pdf_file\n";

    public static void main(String[] args) {
        LOGGER.debug("Start, v{}", VERSION);
        FormParser formParser = new FormParser();

        try {
            String mode = formParser.getMode(args);
            switch (mode) {
                case PARAM_KEY__GET_LIST:
                    int idx_file_name = (args[0].equals(PARAM_KEY__GET_LIST)) ? 1 : 0;
                    formParser.printFields(args[idx_file_name]);
                    break;
                default:
                    printHelp();
            }
        } catch (Exception e) {
            LOGGER.error("Problem: ", e);
        }
        LOGGER.debug("Finish");
    }

    private String getMode(String[] args) {
        LOGGER.debug("Start with {}", Arrays.asList(args));
        String mode = "UNKNOWN";
        if (args.length > 0) {
            if ((Arrays.binarySearch(args, PARAM_KEY__GET_LIST) >= 0) && (args.length == 2)) {
                mode = PARAM_KEY__GET_LIST;
            } else {
                LOGGER.warn("Invoke with unknown params: {}", Arrays.asList(args));
            }
        } else {
            LOGGER.warn("Invoke with empty arguments! Don't run.");
        }
        LOGGER.debug("Finish, return: [{}]", mode);
        return mode;
    }

    private void printFields(String file_name) throws IOException {
        LOGGER.debug("Start for [{}]", file_name);
        LOGGER.trace("Try open PDF file: [{}]", file_name);

        URL fileURL = super.getClass().getClassLoader().getResource(file_name);
        if (fileURL == null) throw new IOException("NOT FOUND File \"" + file_name + "\"");

        PdfReader pdfReader = null;
        try {
            String src = fileURL.getFile();
            LOGGER.debug("Try open PDF file: [{}]", src);
            pdfReader = new PdfReader(src);
            if (pdfReader == null) throw new IOException("Problem iText with load PdfReader!");
            LOGGER.debug("PdfReader open succ");


            AcroFields acroFields = pdfReader.getAcroFields();
            LOGGER.debug("AcroFields form getted: [{}]", acroFields);
            if (acroFields == null) throw new IOException("AcroFields not exist!");

            Map<String, AcroFields.Item> fields = acroFields.getFields();
            LOGGER.debug("Field count: [{}]", fields.size());
            int i = 0;
            for (String field_key : fields.keySet()) {
                AcroFields.Item field = fields.get(field_key);
                LOGGER.info("{}. [Page:{}, tabOrder:{}, Field.size:{}, Key:{}] ", ++i, field.getPage(0), field.getTabOrder(0), field.size(), field_key);
            }
        } catch (IOException e) {
            LOGGER.error("Problem: ", e);
        } finally {
            if (pdfReader != null)
                pdfReader.close();
            LOGGER.trace("PdfReader closed");
        }
        LOGGER.debug("Finish processing PDF doc [{}]", file_name);
    }

    /**
     * parse key=value array as single String
     *
     * @param key_values_string - key=value array as single String
     * @return -
     */
    public static Hashtable<String, String> prepareData(String key_values_string) throws ParseException {
        LOGGER.debug("Invoke with [{}]", key_values_string);
        Hashtable<String, String> result = new Hashtable<>();
        String[] key_values_arr = key_values_string.split(";");
        LOGGER.trace("Found the {} pair key=value", key_values_arr.length);
        for (String key_value : key_values_arr) {
            String[] key_value_pair_arr = key_value.split("=");
            if (key_value_pair_arr.length != 2) {
                throw new ParseException("Wrong format \"key=value\" pair: " + key_value, 0);
            }
            if (result.containsKey(key_value_pair_arr[0])) {
                throw new ParseException("Found duplicate Key: " + key_value_pair_arr[0], 0);
            }
            result.put(key_value_pair_arr[0], key_value_pair_arr[1]);
        }
        return result;
    }

    private static void printHelp() {
        LOGGER.debug("Invoke");
        System.out.println(MSG_HELP);
    }
}

如何找到并列出 acroform .pdf 中可编辑字段的内部字段标签？

How can the internal field labels of the editable fields in an acroform .pdf be found and listed?

java

pdf

field

labels

iText 7

PDFBox 2