Java 安全列表将 <head> 标记添加到允许列表

Java Safelist Add <head> Tag to Allowed List

我想创建一个白名单以删除除 headbody 之外的所有 html 标签i 在一个数据中。为此,我使用了 Safelist class 和 jsoup 库。

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head>Title here</head>
               <body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
System.out.println(cleaned_data);

预期结果是

<head>
 Title here
</head>
<body>
 paragraph 1 <i>paragraph 2</i>
</body>

但是我得到的结果

<body>
 Title here paragraph 1 <i>paragraph 2</i>
</body>

虽然 head 标签在允许列表中,但与 body 和 i 标签不同,它已从数据中删除。 head 标签有什么问题,我应该怎么做才能将它保存在数据中?

因为 HTML 文件的真实结构是:

<html>
 <head>
   <title>Page Title</title>
 </head>
 <body>
 </body>
</html>

那么你的代码应该这样写:

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head><title>Title here</title></head>
               <body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
System.out.println(cleaned_data)

当你只使用 <head> title hear</head> 时,Jsoup 认为标签之间的文本是“textNode”。

我找到了解决办法。它可能不是确切的解决方案,但它适用于我的情况。 Jsoup官网有以下信息:

The cleaner and these safelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a safelist that allows html and head elements as appropriate.

因为创建一个允许 html 和适当的 head 元素的安全列表不起作用,我采纳了第一个建议:

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] {"body", "i"});
String data = "<body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
cleaned_data  = '<head>Title here</head>' + cleaned_data 
System.out.println(cleaned_data);

https://jsoup.org/apidocs/org/jsoup/safety/Safelist.html