使用 jsoup 在两个(不同的)HTML 标签之间提取文本
Extract text between two (different) HTML tags using jsoup
我有以下 HTML 代码片段:
<td>
<span class="detailh2" style="margin:0px">This month: </span>2 145
<span class="detailh2">Total: </span> 31 704
<span class="detailh2">Last: </span> 30.12.2021
</td>
我的目标是提取 Total: span
之后的代码部分。这意味着输出应该如下所示:
31 704
我知道了:
String total = doc.select("td:contains(Total:)").get(0).ownText();
,其中 returns:
2 145 31 704 30.12.2021
如您所见,所有三个值都合并到一个容易混淆的字符串中。有什么方法(方法?)可以 return 它们在数组(列表)中吗?
["2 145", "31 704", "30.12.2021"]
(我实际上不需要数组,我只对 Total 值感兴趣)
使用Element.nextSibling()方法。在下面的示例代码中,所需的值被放入字符串的列表接口中:
String html = "<td>\n"
+ " <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
+ " <span class=\"detailh2\">Total: </span> 31 704 \n"
+ " <span class=\"detailh2\">Last: </span> 30.12.2021 \n"
+ "</td>";
List<String> valuesList = new ArrayList<>();
Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
Node node = a.nextSibling();
valuesList.add(node.toString().trim());
}
// Display valuesLlist in Condole window:
for (String value : valuesList) {
System.out.println(value);
}
它将在控制台中显示以下内容 Window:
2 145
31 704
30.12.2021
如果您只想获取 Total:
的值,那么您可以试试这个:
String html = "<td>\n"
+ " <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
+ " <span class=\"detailh2\">Total: </span> 31 704 \n"
+ " <span class=\"detailh2\">Last: </span> 30.12.2021 \n"
+ "</td>";
String totalValue = "N/A";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
if (a.before("</span>").text().contains("Total:")) {
Node node = a.nextSibling();
totalValue = "Total: --> " + node.toString().trim();
break;
}
}
// Display the value in Condole window:
System.out.println(totalValue);
以上代码将在控制台中显示以下内容Window:
Total: --> 31 704
我有以下 HTML 代码片段:
<td>
<span class="detailh2" style="margin:0px">This month: </span>2 145
<span class="detailh2">Total: </span> 31 704
<span class="detailh2">Last: </span> 30.12.2021
</td>
我的目标是提取 Total: span
之后的代码部分。这意味着输出应该如下所示:
31 704
我知道了:
String total = doc.select("td:contains(Total:)").get(0).ownText();
,其中 returns:
2 145 31 704 30.12.2021
如您所见,所有三个值都合并到一个容易混淆的字符串中。有什么方法(方法?)可以 return 它们在数组(列表)中吗?
["2 145", "31 704", "30.12.2021"]
(我实际上不需要数组,我只对 Total 值感兴趣)
使用Element.nextSibling()方法。在下面的示例代码中,所需的值被放入字符串的列表接口中:
String html = "<td>\n"
+ " <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
+ " <span class=\"detailh2\">Total: </span> 31 704 \n"
+ " <span class=\"detailh2\">Last: </span> 30.12.2021 \n"
+ "</td>";
List<String> valuesList = new ArrayList<>();
Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
Node node = a.nextSibling();
valuesList.add(node.toString().trim());
}
// Display valuesLlist in Condole window:
for (String value : valuesList) {
System.out.println(value);
}
它将在控制台中显示以下内容 Window:
2 145
31 704
30.12.2021
如果您只想获取 Total:
的值,那么您可以试试这个:
String html = "<td>\n"
+ " <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
+ " <span class=\"detailh2\">Total: </span> 31 704 \n"
+ " <span class=\"detailh2\">Last: </span> 30.12.2021 \n"
+ "</td>";
String totalValue = "N/A";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
if (a.before("</span>").text().contains("Total:")) {
Node node = a.nextSibling();
totalValue = "Total: --> " + node.toString().trim();
break;
}
}
// Display the value in Condole window:
System.out.println(totalValue);
以上代码将在控制台中显示以下内容Window:
Total: --> 31 704