JEditorPane text/html 元素获取 HTML 内部元素

Question

我正在 Java Swing 中创建一个 HTML 编辑器。它使用 JEditorPane 和 text/html MIME 类型。我有以下 HTML 结构的情况：

<body>
    <p>This is a <b>BOLD</b> word in a sentence</p>
</body>

当光标放在该句子中并且有人单击“列表”按钮时，HTML 通过创建一个新列表进行修改，其中包含光标的段落作为第一个列表项。像这样：

<body>
    <ol>
        <li>
            <p>This is a <b>BOLD</b> word in a sentence</p>
        </li>
    </ol>
</body>

到目前为止，我可以使它正常工作以创建列表元素，但我无法将粗体标记插入到新列表中的正确位置。换句话说，我可以创建列表项，但粗体标签不见了。

我需要某种方法来获取 Element 对象（在本例中为段落对象）的内部或外部HTML，以便我可以完整地复制内容，包括粗体标记。到目前为止，我只能复制

标签内的文本，其中不包括粗体标签。

到目前为止，这是我的代码。这是在扩展的编辑器窗格对象中。 htmlDoc_ 是编辑器窗格的 HTMLDocument。

public void toggledListButton() {
    
    // turning the paragraph into a list
    
    // get the paragraph element, cursor should always be inside
    // a paragraph somewhere
    Element elem = htmlDoc_.getParagraphElement( this.getCaretPosition() );
    
    int caretPos = this.getCaretPosition();
    int elemStart = elem.getStartOffset();
    int elemEnd = elem.getEndOffset();

    String elemText = "" ;
    try {
        elemText = htmlDoc_.getText(elemStart, elemEnd - elemStart);
    } catch (BadLocationException e1) {
        e1.printStackTrace();
    }

    try {
        htmlDoc_.setOuterHTML(elem, "<ol><li><p>" + elemText + "</p></li></ol>");
    } catch (BadLocationException e1) {
        e1.printStackTrace();
    } catch (IOException e1) {
        e1.printStackTrace();
    }
    
    // amount of text doesnt change, so we can just set the caretPos where it was
    this.setCaretPosition(caretPos);
    this.requestFocusInWindow();
    
}

如果我能以某种方式获得“elem”元素的内部 HTML，我想我将拥有我需要插入到新列表中的内容。或者可能将元素传递给 JSoup 并以这种方式提取 HTML，但我不知道如何将元素传递给 JSoup。

编辑----------------

根据下面关于遍历元素的评论，我进行了此更改以采用“elem”变量尝试循环遍历段落中的每个子项，并以此方式构建段落的 html。问题是它似乎没有将标签检测为单独的元素，它只检测到 3 text/Leaf 个元素。

    String paragraphHTML = "";
    for (int i = 0; i < elem.getElementCount(); i++) {
      
        Element child = elem.getElement(i);
        if (child.isLeaf()) {
            try {
                paragraphHTML += child.getDocument().getText(0, child.getDocument().getLength());
            } catch (BadLocationException e) {
                e.printStackTrace();
            }
            
        } else {
            paragraphHTML += "<" + child.getName() + ">";   
        }
        
    }
    System.out.println("paragraphHTML=" + paragraphHTML);

段落HTML 仅作为不包括标签的文本输出。我将如何检测标签？谢谢

Answer 1

If I could somehow get the inner HTML of the "elem" Element,

您可以使用HTMLEditorKit.write(...)方法在插入符号位置写出段落的text/tags：

import java.awt.*;
import java.io.*;
import java.util.*;
import javax.swing.*;
import javax.swing.event.*;
import javax.swing.text.*;

public class EditorPaneExtract extends JPanel implements CaretListener
{
    private JEditorPane editor;
    private JTextArea partial;
    private JLabel extracted;

    public EditorPaneExtract() throws Exception
    {
        setLayout( new BorderLayout() );

        String text = "<html><head><title>Title</title><body><pre>123456789</pre><p>Line one with <b>bold</b> text</p><p>Line two with <i>italic</i> text</p></body></html>";

        editor = new JEditorPane();
        editor.setContentType( "text/html" );
        editor.setText( text );
        editor.addCaretListener( this );

        JScrollPane scrollPane = new JScrollPane( editor );
        scrollPane.setPreferredSize( new Dimension(400, 120) );
        add(scrollPane, BorderLayout.PAGE_START);

        JTextArea full = new JTextArea(20, 25);
        full.setEditable( false );
        add(new JScrollPane(full), BorderLayout.LINE_START);

        full.setText( editor.getText() );

        partial = new JTextArea(20, 25);
        partial.setEditable( false );
        add(new JScrollPane(partial), BorderLayout.LINE_END);

        extracted = new JLabel(" ");
        add(extracted, BorderLayout.PAGE_END);
    }

    @Override
    public void caretUpdate(CaretEvent e)
    {
        try
        {
            int offset = editor.getCaretPosition();

            StyledDocument doc = (StyledDocument)editor.getDocument();
            Element paragraph = doc.getParagraphElement(offset);
            int start = paragraph.getStartOffset();
            int end = paragraph.getEndOffset();

            StringWriter writer = new StringWriter();
            editor.getEditorKit().write(writer,  editor.getDocument(), start, end - start);

            partial.setText( writer.toString() );

            StringBuilder sb = new StringBuilder( writer.toString() );
            sb.delete(sb.length() - 19, sb.length() -1);
            sb.delete(0, 20);

            extracted.setText( sb.toString() );
         }
         catch (Exception e1)
         {
             e1.printStackTrace();
         }
    }

    public static void main(String[] args) throws Exception
    {
        JFrame frame = new JFrame();
        frame.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
        frame.add( new EditorPaneExtract() );
        frame.pack();
        frame.setLocationRelativeTo( null );
        frame.setVisible(true);
    }
}

在上面的代码中：

顶部组件是编辑器窗格
左边的部分是完整的HTML文本
右边的部分是提取的文本HTML。请注意，始终包含“html”和“body”标签
底部组件是提取的 HTML，没有“html”和“body”标签。

只需将插入符逐行移动即可查看差异。

JEditorPane text/html 元素获取 HTML 内部元素

JEditorPane text/html Elements get HTML inside element

java

swing

jeditorpane

JEdi​​torPane text/html 元素获取 HTML 内部元素

JEditorPane text/html Elements get HTML inside element

java

swing

jeditorpane

JEditorPane text/html 元素获取 HTML 内部元素