从 html 字符串中删除字体颜色
Removing font color from html string
我有一个简单的 html 字符串,例如:
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj"><font color="#009a49">Good evening</font></a></p>
我希望输出为:
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>
要达到同样的效果应该怎么做?
我的尝试:
//removing font tags
Document doc = Jsoup.parse(webText);
Elements elements = doc.select("font");
//remove all 'font'-tags
elements.remove();
webText = doc.toString();
通过正则表达式:搜索模式 (?i)<\/?font[^>]*>
并替换为 ""
String cleanstr = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite' data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
cleanstr = cleanstr.replaceAll("(?i)<\/?font[^>]*>", "");
System.out.println(cleanstr);
如果你想使用正则表达式,你可以使用:<\/{0,1}font.*?>
String html = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite'
data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
html = html.replaceAll("<\/{0,1}font.*?>","");
System.out.println(html);
输出:
<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite'
data='MeTdMw9Ndj'>Good evening</a></p>
只需要取font
个节点的所有子节点Node.childNodes()
, and append it to the parent node with Element.insertChildren(int index, Collection<? extends Node> children)
after the index font
nodes (which can be retrieved with Node.siblingIndex
)即可。
Document doc = Jsoup.parse(webText);
Elements elements = doc.select("font");
for (Element e: elements) {
e.parent().insertChildren(e.siblingIndex(), e.childNodes());
}
elements.remove();
webText = doc.toString();
我已经在 Java 7 上用不同版本的 Jsoup - 1.7.2、1.7.3 和 1.8.1 测试了代码。所有这些都产生了预期的结果。
这是我的测试代码:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class SO27854788 {
public static void main(String[] args) {
Document doc = Jsoup.parse("<font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font> <font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font></font> <font color=\"#009a49\">Good evening</font></font><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening</font></a></p><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening. Here are some <span>more tags inside</span></font></a></p>");
Elements elements = doc.select("font");
for (Element e: elements) {
e.parent().insertChildren(e.siblingIndex(), e.childNodes());
}
elements.remove();
System.out.println(doc.toString());
}
}
并且输出:
<html>
<head></head>
<body>
Good evening Good evening Good evening Good evening Good evening
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening. Here are some <span>more tags inside</span></a></p>
</body>
</html>
我有一个简单的 html 字符串,例如:
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj"><font color="#009a49">Good evening</font></a></p>
我希望输出为:
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>
要达到同样的效果应该怎么做?
我的尝试:
//removing font tags
Document doc = Jsoup.parse(webText);
Elements elements = doc.select("font");
//remove all 'font'-tags
elements.remove();
webText = doc.toString();
通过正则表达式:搜索模式 (?i)<\/?font[^>]*>
并替换为 ""
String cleanstr = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite' data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
cleanstr = cleanstr.replaceAll("(?i)<\/?font[^>]*>", "");
System.out.println(cleanstr);
如果你想使用正则表达式,你可以使用:<\/{0,1}font.*?>
String html = "<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite'
data='MeTdMw9Ndj'><font color='#009a49'>Good evening</font></a></p>";
html = html.replaceAll("<\/{0,1}font.*?>","");
System.out.println(html);
输出:
<p dir='ltr'><a href='xxxx://viewstudent/MeTdMw9Ndj' class='favourite'
data='MeTdMw9Ndj'>Good evening</a></p>
只需要取font
个节点的所有子节点Node.childNodes()
, and append it to the parent node with Element.insertChildren(int index, Collection<? extends Node> children)
after the index font
nodes (which can be retrieved with Node.siblingIndex
)即可。
Document doc = Jsoup.parse(webText);
Elements elements = doc.select("font");
for (Element e: elements) {
e.parent().insertChildren(e.siblingIndex(), e.childNodes());
}
elements.remove();
webText = doc.toString();
我已经在 Java 7 上用不同版本的 Jsoup - 1.7.2、1.7.3 和 1.8.1 测试了代码。所有这些都产生了预期的结果。
这是我的测试代码:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class SO27854788 {
public static void main(String[] args) {
Document doc = Jsoup.parse("<font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font> <font color=\"#009a49\">Good evening <font color=\"#009a49\">Good evening</font></font> <font color=\"#009a49\">Good evening</font></font><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening</font></a></p><p dir=\"ltr\"><a href=\"xxxx://viewstudent/MeTdMw9Ndj\" class=\"favourite\" data=\"MeTdMw9Ndj\"><font color=\"#009a49\">Good evening. Here are some <span>more tags inside</span></font></a></p>");
Elements elements = doc.select("font");
for (Element e: elements) {
e.parent().insertChildren(e.siblingIndex(), e.childNodes());
}
elements.remove();
System.out.println(doc.toString());
}
}
并且输出:
<html>
<head></head>
<body>
Good evening Good evening Good evening Good evening Good evening
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening</a></p>
<p dir="ltr"><a href="xxxx://viewstudent/MeTdMw9Ndj" class="favourite" data="MeTdMw9Ndj">Good evening. Here are some <span>more tags inside</span></a></p>
</body>
</html>