如何按 <TD> 日期对 HTML 元素进行排序并在 Java 中按 <A> 删除重复项?

How to sort HTML elements by <TD> date and delete duplicates by <A> in Java?

我有一个 HTML 片段需要在 Java 中更改,我一直在使用 JSOUP 进行解析,但我觉得它可能效率不高。我在这里上传了我要找的东西的照片。从 TD 新闻的日期到最旧的排序,如果有重复的 A hrefs,则将节点作为一个整体删除。 我有一个给定 div 的数组列表,它也将被包含

ObservableList<String> names; 

我认为一种方法可能是遍历列表并从该名称一直向下抓取直到 div 被命中?我觉得这是一个简单的问题,我想多了,谢谢你的帮助!

foreach(String name: names)
{}

Before example

Sorted without duplicates example

HTML(不重复排序):

<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:54AM&nbsp;&nbsp;</td>
</br>
<a sname='CHTR' href="https://test.com/news/why-charter-chtr-stock-might-135401270.html" target="_blank" class="tab-link-news">Why Charter (CHTR) Stock Might be a Great Pick</a></br>
<td width="130" align="right">Mar-04-20 08:53AM&nbsp;&nbsp;</td>
</br>
<a sname='CHTR' href="https://test.com/news/charter-offers-senior-unsecured-notes-135400843.html" target="_blank" class="tab-link-news">Charter Offers Senior Unsecured Notes</a>. 
</br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM&nbsp;&nbsp;</td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<div>CHTR</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:39 PM&nbsp;&nbsp;</td>
</br>
<a sname='CHTR' href="www.test.com/news/3548649-charter-offering-senior-notes">Charter offering more senior notes</a></br>
<div>PEGI</div>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 12:49 PM&nbsp;&nbsp;</td>
</br>
<a sname='PEGI' href="www.test.com/news/3548648-pattern-energy-low-odds-of-competing-bid-raymond-james-says">Pattern Energy has low odds of competing bid, Raymond James says</a></br>
<td width="130" align="right" style="white-space:nowrap">Mar-04-20 08:40 AM&nbsp;&nbsp;</td>
</br>
<a sname='PEGI' href="www.test.com/news/greatbuy">Great buy with PEGI</a></br>    

解析 HTML 并将它们添加到列表中,然后创建自定义对象,然后使用两个比较器对列表进行双重排序。

    Comparator<MyObject> compareByName = Comparator
         .comparing(Article::getName);

    Comparator<MyObject> compareByName2 = Comparator
             .comparing(MyObject::getDate).reversed();

    myList.sort(compareByName.thenComparing(compareByName2));

不确定我是否理解正确,但是如何将 Html 解析为 HashMap 以获得键值对呢?这至少会消除重复。然后你可以这样做:

List<Employee> employeeById = new ArrayList<>(map.values());
Collections.sort(employeeById);

得到的结果为:

[Employee{id=1, name='Mher'}, 
Employee{id=2, name='George'}, 
Employee{id=8, name='John'}, 
Employee{id=22, name='Annie'}]

来源:https://www.baeldung.com/java-hashmap-sort