Java 安全列表将 <head> 标记添加到允许列表
Java Safelist Add <head> Tag to Allowed List
我想创建一个白名单以删除除 head、body 和 之外的所有 html 标签i 在一个数据中。为此,我使用了 Safelist class 和 jsoup 库。
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head>Title here</head>
<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
System.out.println(cleaned_data);
预期结果是
<head>
Title here
</head>
<body>
paragraph 1 <i>paragraph 2</i>
</body>
但是我得到的结果
<body>
Title here paragraph 1 <i>paragraph 2</i>
</body>
虽然 head 标签在允许列表中,但与 body 和 i 标签不同,它已从数据中删除。 head 标签有什么问题,我应该怎么做才能将它保存在数据中?
因为 HTML 文件的真实结构是:
<html>
<head>
<title>Page Title</title>
</head>
<body>
</body>
</html>
那么你的代码应该这样写:
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head><title>Title here</title></head>
<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
System.out.println(cleaned_data)
当你只使用 <head> title hear</head>
时,Jsoup 认为标签之间的文本是“textNode”。
我找到了解决办法。它可能不是确切的解决方案,但它适用于我的情况。 Jsoup官网有以下信息:
The cleaner and these safelists assume that you want to clean a body
fragment of HTML (to add user supplied HTML into a templated page),
and not to clean a full HTML document. If the latter is the case,
either wrap the document HTML around the cleaned body HTML, or create
a safelist that allows html and head elements as appropriate.
因为创建一个允许 html 和适当的 head 元素的安全列表不起作用,我采纳了第一个建议:
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] {"body", "i"});
String data = "<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
cleaned_data = '<head>Title here</head>' + cleaned_data
System.out.println(cleaned_data);
我想创建一个白名单以删除除 head、body 和 之外的所有 html 标签i 在一个数据中。为此,我使用了 Safelist class 和 jsoup 库。
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head>Title here</head>
<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
System.out.println(cleaned_data);
预期结果是
<head>
Title here
</head>
<body>
paragraph 1 <i>paragraph 2</i>
</body>
但是我得到的结果
<body>
Title here paragraph 1 <i>paragraph 2</i>
</body>
虽然 head 标签在允许列表中,但与 body 和 i 标签不同,它已从数据中删除。 head 标签有什么问题,我应该怎么做才能将它保存在数据中?
因为 HTML 文件的真实结构是:
<html>
<head>
<title>Page Title</title>
</head>
<body>
</body>
</html>
那么你的代码应该这样写:
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head><title>Title here</title></head>
<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
System.out.println(cleaned_data)
当你只使用 <head> title hear</head>
时,Jsoup 认为标签之间的文本是“textNode”。
我找到了解决办法。它可能不是确切的解决方案,但它适用于我的情况。 Jsoup官网有以下信息:
The cleaner and these safelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a safelist that allows html and head elements as appropriate.
因为创建一个允许 html 和适当的 head 元素的安全列表不起作用,我采纳了第一个建议:
Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] {"body", "i"});
String data = "<body>
<p><b> paragraph 1</b></p>
<p><i> paragraph 2</i></p>
</body>";
String cleaned_data = Jsoup.clean(data,safe_list);
cleaned_data = '<head>Title here</head>' + cleaned_data
System.out.println(cleaned_data);