使用 jsoup 解析的问题 HTML Table

Problems using jsoup to parse HTML Table

前几天我问了一个用 JSOUP 解析 HTML Table 的问题。 @luksch 帮助了我,我可以解决我的问题。我的问题是如何将包含许多 TR 和 TD 的 HTML 文件的一部分解析为 select 其中的特定文本(Group Block TABLE)。

HTML 代码:

<TABLE SUMMARY="Topline" WIDTH="100%">
<TR><TD HEIGHT=16>&nbsp;</TD></TR>  <!-- For the menu bar -->
<TR>
<TD VALIGN=MIDDLE ALIGN=LEFT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Xymon</B></FONT
</TD>
<TD VALIGN=MIDDLE ALIGN=CENTER WIDTH="40%">
<CENTER><FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Current Status</B></FONT></CENTER>
</TD>
<TD VALIGN=MIDDLE ALIGN=RIGHT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Thu Jul 23 16:05:06 2015</B></FONT>
</TD>
</TR>
<TR>
<TD COLSPAN=3> <HR WIDTH="100%"> </TD>
</TR>
</TABLE>
<BR>
<A NAME=hosts-blk>&nbsp;</A>

<CENTER><TABLE SUMMARY="Group Block" BORDER=0 CELLPADDING=2>
<TR><TD VALIGN=MIDDLE ROWSPAN=2><CENTER><FONT COLOR="#FFFFF0" SIZE="+1">&nbsp;</FONT></CENTER></TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45> 
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbd"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbgen"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbgen</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbtest"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbtest</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?conn"><FONT COLOR="#87a9e5" SIZE="-1"><B>conn</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?cpu"><FONT COLOR="#87a9e5" SIZE="-1"><B>cpu</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?disk"><FONT COLOR="#87a9e5" SIZE="-1"><B>disk</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?files"><FONT COLOR="#87a9e5" SIZE="-1"><B>files</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?hobbitd"><FONT COLOR="#87a9e5" SIZE="-1"><B>hobbitd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?http"><FONT COLOR="#87a9e5" SIZE="-1"><B>http</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?info"><FONT COLOR="#87a9e5" SIZE="-1"><B>info</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?memory"><FONT COLOR="#87a9e5" SIZE="-1"><B>memory</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?msgs"><FONT COLOR="#87a9e5" SIZE="-1"><B>msgs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?ports"><FONT COLOR="#87a9e5" SIZE="-1"><B>ports</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?procs"><FONT COLOR="#87a9e5" SIZE="-1"><B>procs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?trends"><FONT COLOR="#87a9e5" SIZE="-1"><B>trends</B></FONT></A> </TD>
</TR> 

<TR><TD COLSPAN=15><HR WIDTH="100%"></TD></TR>
<TR class=line>
<TD NOWRAP><A NAME="hostname1">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.1">hostname1</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1.&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbd:green:268d04h25m" TITLE="bbd:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbgen"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbgen:green:268d04h24m" TITLE="bbgen:green:268d04h24m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbtest"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbtest:green:268d04h25m" TITLE="bbtest:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:268d04h25m" TITLE="conn:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=cpu"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="cpu:green:169d00h15m" TITLE="cpu:green:169d00h15m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=disk"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="disk:green:268d04h25m" TITLE="disk:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=files"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="files:clear:268d04h25m" TITLE="files:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=hobbitd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="hobbitd:green:169d01h05m" TITLE="hobbitd:green:169d01h05m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.1" TITLE="info:green:127.0.0.1" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=memory"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="memory:green:268d04h25m" TITLE="memory:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=msgs"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="msgs:green:268d04h20m" TITLE="msgs:green:268d04h20m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=ports"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="ports:clear:268d04h25m" TITLE="ports:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=procs"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="procs:clear:268d04h25m" TITLE="procs:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

<TR class=line>
<TD NOWRAP><A NAME="hostname2">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.2">hostname2</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/red.gif" ALT="bbd:red:16d06h46m" TITLE="bbd:red:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:16d06h46m" TITLE="conn:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:16d06h46m" TITLE="http:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.2" TITLE="info:green:127.0.0.2" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

</TABLE></CENTER><BR>
<BR><BR>

第一部分(TABLE Group Block with bbd, bbdgen, bbtest, etc)我修复了:

ArrayList<String> groupBlock = new ArrayList<String>();
Object[] objPlace;
Element table = document.select("TABLE").get(1); //select the second table:     "Group Block"
Elements rows = table.select("TR");             
for (int i = 0; i < rows.size(); i++) {
    Element row = rows.get(i);
    Elements cols = row.select("TD");
       for (Element col : cols){
           switch(col.text()){
           case "bbd": 
           case "bbgen":
           case "bbtest":
           //...more cases
               groupBlock.add(col.text());
               break;
           default:
               break;
           }
       }      
}
objPlace = groupBlock.toArray();

现在我必须解析两个主机名(主机名 1 和主机名 2)以放入单独的 TextView,但问题是主机名将来可能会更改其名称。另外,我还要解析每个TD中的"IMG SRC",例如:

<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>

我只需要解析 IMG SRC /hobbit/gifs/static/green.gif,必须在开头附加 url 的其余部分:http://example.com/hobbit/gifs/static/green.gif 获取图像并将其放在 XML 布局中的另一个字段中。我必须对 HTML 文件中的所有 IMG SRC TD 执行此操作。

我知道一旦我得到图像,我必须做类似的事情:

InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
ImageView logoimg = (ImageView) findViewById(R.id.logo);
logoimg.setImageBitmap(bitmap);

imgSrc 应该是包含所有 IMG SRC 的数组

我不知道如何开始前面的步骤,我是 Jsoup 和 Android 的新手。

您可以使用主机名元素查询 <td>。然后去parent,即<tr>。从那时起再次获得所有 children <td>。这些将是包含您要获取的链接的条目。沿着这条:

Document document = Jsoup.parse(html);
Element table = document.select("TABLE").get(1); 
Elements asWithName = table.select("tr>td a[name]");
for (Element aWithName : asWithName){
    String name = aWithName.attr("name");
    System.out.println("hostname="+name);
    Element tr = aWithName.parent().parent();
    for (Element td : tr.select("td")){
        Element img = td.select("img").first();
        if (img == null){
            continue;
        }
        String imgRelPath = img.attr("src");
        System.out.println("  imgRelPath="+imgRelPath);
    }
}