我如何使用jsoup遍历div
How do i loop through divs using jsoup
大家好,我在 IntelliJ 上的 java 网络应用程序中使用 jsoup。我正在尝试从船舶跟踪 website 中抓取港口停靠事件的数据并将数据存储在 mySQL 数据库中。
事件的数据组织在divs with the class name table-group and the values are in another div with the class name table-row.
我的问题是所有容器的 divs 行都是相同的 class 名称和 im试图遍历每一行并将数据推送到数据库。到目前为止,我已经设法创建了一个 java class 来抓取第一行。
我如何遍历每一行并将这些值存储到我的数据库中。我应该创建一个数组列表来存储值吗?
这是我的刷屏class
public class Scarper {
private static Document doc;
public static void main(String[] args) {
final String url =
"https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Events();
}
public static void Events() {
Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");
List<String> arrayList = new ArrayList();
for (Element ele : elm) {
String event = ele.select("div.col:nth-of-type(2)").text();
String time = ele.select("div.col:nth-of-type(3)").text();
String port = ele.select("div.col:nth-of-type(4)").text();
String vessel = ele.select(".td_vesseltype.col").text();
Event ev = new Event();
System.out.println(event);
System.out.println(time);
System.out.println(port);
System.out.println(vessel);
}
}
}
divclass我想抓取的样本
<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
<div class="cs-table">
<div class="heading">
<div class="col" style="width: 10px"></div>
<div class="col" style="width: 110px">Event</div>
<div class="col" style="width: 120px">Time (<span class="tooltip" title="My Time: In your current TimeZone">MT</span>)</div>
<div class="col" style="width: 150px">Port</div>
<div class="col">Vessel</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-belfast-in-gb-united-kingdom-id-101">BELFAST</a></div>
<div class="col td_vesseltype"><img src="/icons/icon7_511.png"><span class="padding_18"><a href="/vessels/wilson-blyth-mmsi-314544000-imo-9124419">WILSON BLYTH</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-flag-checkered green"></i></div>
<div class="col">Arrival</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-hunters-quay-in-gb-united-kingdom-id-218">HUNTERS QUAY</a></div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18"><a href="/vessels/sound-of-soay-mmsi-235101063-imo-9665229">SOUND OF SOAY</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-largs-in-gb-united-kingdom-id-1602">LARGS</a></div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18"><a href="/vessels/loch-shira-mmsi-235053239-imo-9376919">LOCH SHIRA</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-ryde-in-gb-united-kingdom-id-1629">RYDE</a></div>
<div class="col td_vesseltype"><img src="/icons/icon4_511.png"><span class="padding_18"><a href="/vessels/island-flyer-mmsi-235117772-imo-9737797">ISLAND FLYER</a> [GB]</span></div>
</div>
</div>
您可以从遍历 table 的行开始:table 的选择器是 .cs-table
因此您可以使用 [=12] 获得 table =].接下来,您可以使用选择器 div.table-row
- Elements rows = doc.select("div.table-row");
获取 table 的行,现在您可以遍历所有行并从每一行中提取数据。代码应如下所示:
Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
String event = row.select("div.col:nth-of-type(2)").text();
String time = row.select("div.col:nth-of-type(3)").text();
String port = row.select("div.col:nth-of-type(4)").text();
String vessel = row.select(".td_vesseltype.col").text();
System.out.println(event + "-" + time + " " + port + " " + vessel);
System.out.println("---------------------------");
// Do stuff with data here
}
现在由您决定是要将数据保留在循环中的某些 array/list 中供以后使用,还是直接将其插入数据库。
大家好,我在 IntelliJ 上的 java 网络应用程序中使用 jsoup。我正在尝试从船舶跟踪 website 中抓取港口停靠事件的数据并将数据存储在 mySQL 数据库中。
事件的数据组织在divs with the class name table-group and the values are in another div with the class name table-row.
我的问题是所有容器的 divs 行都是相同的 class 名称和 im试图遍历每一行并将数据推送到数据库。到目前为止,我已经设法创建了一个 java class 来抓取第一行。
我如何遍历每一行并将这些值存储到我的数据库中。我应该创建一个数组列表来存储值吗?
这是我的刷屏class
public class Scarper {
private static Document doc;
public static void main(String[] args) {
final String url =
"https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Events();
}
public static void Events() {
Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");
List<String> arrayList = new ArrayList();
for (Element ele : elm) {
String event = ele.select("div.col:nth-of-type(2)").text();
String time = ele.select("div.col:nth-of-type(3)").text();
String port = ele.select("div.col:nth-of-type(4)").text();
String vessel = ele.select(".td_vesseltype.col").text();
Event ev = new Event();
System.out.println(event);
System.out.println(time);
System.out.println(port);
System.out.println(vessel);
}
}
}
divclass我想抓取的样本
<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
<div class="cs-table">
<div class="heading">
<div class="col" style="width: 10px"></div>
<div class="col" style="width: 110px">Event</div>
<div class="col" style="width: 120px">Time (<span class="tooltip" title="My Time: In your current TimeZone">MT</span>)</div>
<div class="col" style="width: 150px">Port</div>
<div class="col">Vessel</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-belfast-in-gb-united-kingdom-id-101">BELFAST</a></div>
<div class="col td_vesseltype"><img src="/icons/icon7_511.png"><span class="padding_18"><a href="/vessels/wilson-blyth-mmsi-314544000-imo-9124419">WILSON BLYTH</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-flag-checkered green"></i></div>
<div class="col">Arrival</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-hunters-quay-in-gb-united-kingdom-id-218">HUNTERS QUAY</a></div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18"><a href="/vessels/sound-of-soay-mmsi-235101063-imo-9665229">SOUND OF SOAY</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-largs-in-gb-united-kingdom-id-1602">LARGS</a></div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18"><a href="/vessels/loch-shira-mmsi-235053239-imo-9376919">LOCH SHIRA</a> [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/><a href="/ports/port-of-ryde-in-gb-united-kingdom-id-1629">RYDE</a></div>
<div class="col td_vesseltype"><img src="/icons/icon4_511.png"><span class="padding_18"><a href="/vessels/island-flyer-mmsi-235117772-imo-9737797">ISLAND FLYER</a> [GB]</span></div>
</div>
</div>
您可以从遍历 table 的行开始:table 的选择器是 .cs-table
因此您可以使用 [=12] 获得 table =].接下来,您可以使用选择器 div.table-row
- Elements rows = doc.select("div.table-row");
获取 table 的行,现在您可以遍历所有行并从每一行中提取数据。代码应如下所示:
Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
String event = row.select("div.col:nth-of-type(2)").text();
String time = row.select("div.col:nth-of-type(3)").text();
String port = row.select("div.col:nth-of-type(4)").text();
String vessel = row.select(".td_vesseltype.col").text();
System.out.println(event + "-" + time + " " + port + " " + vessel);
System.out.println("---------------------------");
// Do stuff with data here
}
现在由您决定是要将数据保留在循环中的某些 array/list 中供以后使用,还是直接将其插入数据库。