从 html 字符串中删除字体颜色

Removing font color from html string

我有一个简单的 html 字符串,例如:

<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj"><font color="#009a49">Good evening</font></a></p>

我希望输出为:

 <p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>

要达到同样的效果应该怎么做?

我的尝试:

//removing font tags
        Document doc = Jsoup.parse(webText);
        Elements elements = doc.select("font");

        //remove all 'font'-tags
        elements.remove();
        webText = doc.toString();

通过正则表达式:搜索模式 (?i)<\/?font[^>]*> 并替换为 ""

        String cleanstr = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite' data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
        cleanstr = cleanstr.replaceAll("(?i)<\/?font[^>]*>", "");
        System.out.println(cleanstr);

Live demo

如果你想使用正则表达式,你可以使用:<\/{0,1}font.*?>

String html = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite' 
                data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
html = html.replaceAll("<\/{0,1}font.*?>","");
System.out.println(html);

输出:

<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite'
 data='MeTdMw9Ndj'>Good evening</a></p>

勾选demo here

只需要取font个节点的所有子节点Node.childNodes(), and append it to the parent node with Element.insertChildren(int index, Collection<? extends Node> children) after the index font nodes (which can be retrieved with Node.siblingIndex)即可。

Document doc = Jsoup.parse(webText);
Elements elements = doc.select("font");

for (Element e: elements) {
    e.parent().insertChildren(e.siblingIndex(), e.childNodes());
}

elements.remove();
webText = doc.toString();

我已经在 Java 7 上用不同版本的 Jsoup - 1.7.2、1.7.3 和 1.8.1 测试了代码。所有这些都产生了预期的结果。

这是我的测试代码:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class SO27854788 {
    public static void main(String[] args) {
        Document doc = Jsoup.parse("<font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font> <font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font></font> <font color=\"#009a49\">Good evening</font></font><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening</font></a></p><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening. Here are some <span>more tags inside</span></font></a></p>");
        Elements elements = doc.select("font");

        for (Element e: elements) {
            e.parent().insertChildren(e.siblingIndex(), e.childNodes());
        }

        elements.remove();

        System.out.println(doc.toString());
    }
}

并且输出:

<html>
 <head></head>
 <body>
  Good evening Good evening Good evening Good evening Good evening
  <p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>
  <p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening. Here are some <span>more tags inside</span></a></p>
 </body>
</html>