如何按 <TD> 日期对 HTML 元素进行排序并在 Java 中按 <A> 删除重复项?
How to sort HTML elements by <TD> date and delete duplicates by <A> in Java?
我有一个 HTML 片段需要在 Java 中更改,我一直在使用 JSOUP 进行解析,但我觉得它可能效率不高。我在这里上传了我要找的东西的照片。从 TD 新闻的日期到最旧的排序,如果有重复的 A hrefs,则将节点作为一个整体删除。
我有一个给定 div 的数组列表,它也将被包含
ObservableList<String> names;
我认为一种方法可能是遍历列表并从该名称一直向下抓取直到 div 被命中?我觉得这是一个简单的问题,我想多了,谢谢你的帮助!
foreach(String name: names)
{}
Before example
Sorted without duplicates example
HTML(不重复排序):
<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:54AM </td>
</br>
<a sname='CHTR' href="https://test.com/news/why-charter-chtr-stock-might-135401270.html" target="_blank" class="tab-link-news">Why Charter (CHTR) Stock Might be a Great Pick</a></br>
<td width="130" align="right">Mar-04-20 08:53AM </td>
</br>
<a sname='CHTR' href="https://test.com/news/charter-offers-senior-unsecured-notes-135400843.html" target="_blank" class="tab-link-news">Charter Offers Senior Unsecured Notes</a>.
</br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM </td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:39 PM </td>
</br>
<a sname='CHTR' href="www.test.com/news/3548649-charter-offering-senior-notes">Charter offering more senior notes</a></br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM </td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:40 AM </td>
</br>
<a sname='PEGI' href="www.test.com/news/greatbuy">Great buy with PEGI</a></br>
解析 HTML 并将它们添加到列表中,然后创建自定义对象,然后使用两个比较器对列表进行双重排序。
Comparator<MyObject> compareByName = Comparator
.comparing(Article::getName);
Comparator<MyObject> compareByName2 = Comparator
.comparing(MyObject::getDate).reversed();
myList.sort(compareByName.thenComparing(compareByName2));
不确定我是否理解正确,但是如何将 Html 解析为 HashMap
以获得键值对呢?这至少会消除重复。然后你可以这样做:
List<Employee> employeeById = new ArrayList<>(map.values());
Collections.sort(employeeById);
得到的结果为:
[Employee{id=1, name='Mher'},
Employee{id=2, name='George'},
Employee{id=8, name='John'},
Employee{id=22, name='Annie'}]
我有一个 HTML 片段需要在 Java 中更改,我一直在使用 JSOUP 进行解析,但我觉得它可能效率不高。我在这里上传了我要找的东西的照片。从 TD 新闻的日期到最旧的排序,如果有重复的 A hrefs,则将节点作为一个整体删除。 我有一个给定 div 的数组列表,它也将被包含
ObservableList<String> names;
我认为一种方法可能是遍历列表并从该名称一直向下抓取直到 div 被命中?我觉得这是一个简单的问题,我想多了,谢谢你的帮助!
foreach(String name: names)
{}
Before example
Sorted without duplicates example
HTML(不重复排序):
<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:54AM </td>
</br>
<a sname='CHTR' href="https://test.com/news/why-charter-chtr-stock-might-135401270.html" target="_blank" class="tab-link-news">Why Charter (CHTR) Stock Might be a Great Pick</a></br>
<td width="130" align="right">Mar-04-20 08:53AM </td>
</br>
<a sname='CHTR' href="https://test.com/news/charter-offers-senior-unsecured-notes-135400843.html" target="_blank" class="tab-link-news">Charter Offers Senior Unsecured Notes</a>.
</br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM </td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:39 PM </td>
</br>
<a sname='CHTR' href="www.test.com/news/3548649-charter-offering-senior-notes">Charter offering more senior notes</a></br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM </td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:40 AM </td>
</br>
<a sname='PEGI' href="www.test.com/news/greatbuy">Great buy with PEGI</a></br>
解析 HTML 并将它们添加到列表中,然后创建自定义对象,然后使用两个比较器对列表进行双重排序。
Comparator<MyObject> compareByName = Comparator
.comparing(Article::getName);
Comparator<MyObject> compareByName2 = Comparator
.comparing(MyObject::getDate).reversed();
myList.sort(compareByName.thenComparing(compareByName2));
不确定我是否理解正确,但是如何将 Html 解析为 HashMap
以获得键值对呢?这至少会消除重复。然后你可以这样做:
List<Employee> employeeById = new ArrayList<>(map.values());
Collections.sort(employeeById);
得到的结果为:
[Employee{id=1, name='Mher'},
Employee{id=2, name='George'},
Employee{id=8, name='John'},
Employee{id=22, name='Annie'}]