在 arraylist 中使用 Jsoup 进行解析
Parsing with Jsoup in arraylist
我如何用 jsoup 解析它?
<!-- NOVINEEE -->
<div class="right_naslov"><a href="/e-novine">e-novine</a></div>
<div class="right_post">
<span class="right_post_nadnaslov"><font class="nadnaslov">Zanimljiv zadatak</font></span><span class="right_post_datum"><font class="datum">12.12.2014.</font></span>
<span class="right_post_naslov_v"><font class="naslov"><a href="/e-novine/n/?id=340">Profesor učenicima zadao najbolji zadatak ikad!</a></font></span>
<span class="right_post_podnaslov"><font class="podnaslov"></font></span>
<div class="right_post_tekst"><a href="/e-novine/n/?id=340"><img width="180" align="left" class="novine_slika_thumbm" border="0" src="/fajlovi/slike/thumbm/4161-zadatak_naslovna.jpg" /></a><p>72-godišnji profesor bivšim učenicima iz godine u godinu šalje pisma što nije lak zadatak jer mnogi ne žive u istoj državi. Iako radi nešto stvarno posebno, Bruce sebe i dalje smatra prosječnim profesorom. Učenici ipak smatraju suprotno...</p>
<div> </div></div>
</div>
</div>
我想获取 right_naslov 的内容,以及 nadnaslov、naslov 的字体 class 以及 right_post_tekst 的 img src 和 a href。
我试过这样做:
Document doc = Jsoup.connect(url).get();
Elements post = doc.select("right_naslov right_post nadnaslov");
HashMap<String, String> map = new HashMap<String, String>();
map.put("rank", post.text());
// Get the second td
map.put("country", post.text());
// Get the third td
map.put("population", post.text());
// Set all extracted Jsoup Elements into the array
arraylist.add(map);
然后我做:
resultp = data.get(position);
// Locate the TextViews in listview_item.xml
rank = (TextView) itemView.findViewById(R.id.rank);
country = (TextView) itemView.findViewById(R.id.country);
population = (TextView) itemView.findViewById(R.id.population);
// Capture position and set results to the TextViews
rank.setText(resultp.get(PocetnaFragment.RANK));
country.setText(resultp.get(PocetnaFragment.COUNTRY));
population.setText(resultp.get(PocetnaFragment.POPULATION));
我一直在学习这个教程:http://www.androidbegin.com/tutorial/android-jsoup-listview-images-texts-html-tables-tutorial/
有多个right_posts
谢谢
更新
在下面的答案之后(在所有评论之后)出现以下错误:
02-14 23:50:17.490 2469-2530/gimbi.edu.ba W/System.err﹕ java.lang.NullPointerException: Attempt to invoke virtual method 'org.jsoup.select.Elements org.jsoup.nodes.Element.select(java.lang.String)' on a null object reference
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:147)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:103)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask.call(AsyncTask.java:288)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.FutureTask.run(FutureTask.java:237)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:231)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.lang.Thread.run(Thread.java:818)
02-14 23:50:46.713 2469-4859/gimbi.edu.ba W/System.err﹕ java.lang.NullPointerException: Attempt to invoke virtual method 'org.jsoup.select.Elements org.jsoup.nodes.Element.select(java.lang.String)' on a null object reference
02-14 23:50:46.718 2469-4859/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:147)
02-14 23:50:46.718 2469-4859/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:103)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask.call(AsyncTask.java:288)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.FutureTask.run(FutureTask.java:237)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:231)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.lang.Thread.run(Thread.java:818)
正如我在下面所说的,我尝试删除 img element
,但是当我调用 map.put
方法时,所有元素都相同。
阅读此link,了解如何使用 Jsoup 提取数据。
以下是我根据您的场景举例。
Document doc = null;
Element aEle = null;
Element fontEle = null;
try {
doc = ......
/** Get A tag that is under DIV with classname right_naslov **/
aEle = doc.select("div.right_naslov > a").first();
if (aEle != null) {
System.out.println("right_naslov content: " + aEle.ownText());
}
/** Get Font tag with [classname=nadnaslov] under span[classname=right_post_nadnaslov] under div[lassname=right_post] **/
/** Try to get Font[classname=naslov] with the following method **/
fontEle = doc.select("div.right_post > span.right_post_nadnaslov > font.nadnaslov").first();
if (fontEle != null) {
System.out.println("font nadnaslov content: " + fontEle.ownText());
}
/** Get A tag that is under div[classname=right_post_tekst] under div[classname=right_post] **/
aEle = doc.select("div.right_post > div.right_post_tekst > a").first();
if (aEle != null) {
System.out.println("a href: " + aEle.attr("href"));
/** Get inner IMG tag with classname as 'novine_slika_thumbm' **/
Element imgEle = aEle.select("img.novine_slika_thumbm").first();
if (imgEle != null) {
System.out.println("img src: " + imgEle.attr("src"));
}
}
} catch (Exception e) {
e.printStackTrace();
}
以上示例仅在您正在解析的 HTML 文档中只有一个 DIV[classname=right_naslov]
或 DIV[classname=right_post]
时才有效,因为我在提取数据时使用 Elements.first()
,这意味着我总是 select 第一个符合我们提取标准的元素。尝试使用 Jsoup,玩得开心。获得所有数据后,根据需要将它们存储在 Hashmap
或 ArrayList
中。
已更新
你可以做的是 select 多个 DIV[classname=right_post] 和 Document.select()
,你 returns 对象 Elements
。然后循环每个Element
得到它的内部数据。在我的以下示例中,您将在 arraylist
变量中获得两个 HashMap
项。
有2div[classname=right_naslov],我只检索了<!-- NOVINEEE -->
评论部分之后的第二个。有 5 div[classname=right_post] 我已经忽略了那些没有内部元素 span[classname=right_post_nadnaslov].
List<HashMap<String, String>> arraylist = new ArrayList<HashMap<String, String>>();
Elements aEles = null;
Elements divRightPostEles = null;
String rightNaslov = null;
Document doc = null;
try {
doc = Jsoup.connect(url).get();
/** Get A tag that is under DIV with classname right_naslov **/
aEles = doc.select("div.right_naslov > a");
if (aEles != null && aEles.size() > 0) {
if (aEles.size() == 2)
rightNaslov = aEles.get(1).ownText();
else
rightNaslov = aEles.first().ownText();
}
/**
* Since you say there are multiple DIV with right_post as
* classname, we will get all those right post elements and loop
* them one by one to retrieve its inner elements
**/
divRightPostEles = doc.select("div.right_post");
for (Element rightPostDiv : divRightPostEles) {
/** Each loop of this represents a right_post DIV element **/
HashMap<String, String> map = new HashMap<String, String>();
/**
* Get Font tag with [classname=nadnaslov] under
* span[classname=right_post_nadnaslov] under
* div[lassname=right_post]
**/
/** Try to get Font[classname=naslov] with the following method **/
Elements fontNadnaslov = rightPostDiv
.select("span.right_post_nadnaslov > font.nadnaslov");
/**
* Get A tag that is under div[classname=right_post_tekst] under
* div[classname=right_post]
**/
Element aRightPostTekst = rightPostDiv.select(
"div.right_post_tekst > a[href]").first();
// Retrive Jsoup Elements
if (fontNadnaslov != null && fontNadnaslov.size() > 0) {
map.put("country", fontNadnaslov.first().ownText());
if (aRightPostTekst != null) {
map.put("population", aRightPostTekst.attr("href"));
Element img = aRightPostTekst.select("img[src]").first();
if (img != null)
map.put("image", img.attr("src"));
}
if (rightNaslov != null)
map.put("rank", rightNaslov);
// Set all extracted Jsoup Elements into the array
arraylist.add(map);
}
}
} catch (Exception e) {
e.printStackTrace();
}
我如何用 jsoup 解析它?
<!-- NOVINEEE -->
<div class="right_naslov"><a href="/e-novine">e-novine</a></div>
<div class="right_post">
<span class="right_post_nadnaslov"><font class="nadnaslov">Zanimljiv zadatak</font></span><span class="right_post_datum"><font class="datum">12.12.2014.</font></span>
<span class="right_post_naslov_v"><font class="naslov"><a href="/e-novine/n/?id=340">Profesor učenicima zadao najbolji zadatak ikad!</a></font></span>
<span class="right_post_podnaslov"><font class="podnaslov"></font></span>
<div class="right_post_tekst"><a href="/e-novine/n/?id=340"><img width="180" align="left" class="novine_slika_thumbm" border="0" src="/fajlovi/slike/thumbm/4161-zadatak_naslovna.jpg" /></a><p>72-godišnji profesor bivšim učenicima iz godine u godinu šalje pisma što nije lak zadatak jer mnogi ne žive u istoj državi. Iako radi nešto stvarno posebno, Bruce sebe i dalje smatra prosječnim profesorom. Učenici ipak smatraju suprotno...</p>
<div> </div></div>
</div>
</div>
我想获取 right_naslov 的内容,以及 nadnaslov、naslov 的字体 class 以及 right_post_tekst 的 img src 和 a href。
我试过这样做:
Document doc = Jsoup.connect(url).get();
Elements post = doc.select("right_naslov right_post nadnaslov");
HashMap<String, String> map = new HashMap<String, String>();
map.put("rank", post.text());
// Get the second td
map.put("country", post.text());
// Get the third td
map.put("population", post.text());
// Set all extracted Jsoup Elements into the array
arraylist.add(map);
然后我做:
resultp = data.get(position);
// Locate the TextViews in listview_item.xml
rank = (TextView) itemView.findViewById(R.id.rank);
country = (TextView) itemView.findViewById(R.id.country);
population = (TextView) itemView.findViewById(R.id.population);
// Capture position and set results to the TextViews
rank.setText(resultp.get(PocetnaFragment.RANK));
country.setText(resultp.get(PocetnaFragment.COUNTRY));
population.setText(resultp.get(PocetnaFragment.POPULATION));
我一直在学习这个教程:http://www.androidbegin.com/tutorial/android-jsoup-listview-images-texts-html-tables-tutorial/
有多个right_posts
谢谢
更新
在下面的答案之后(在所有评论之后)出现以下错误:
02-14 23:50:17.490 2469-2530/gimbi.edu.ba W/System.err﹕ java.lang.NullPointerException: Attempt to invoke virtual method 'org.jsoup.select.Elements org.jsoup.nodes.Element.select(java.lang.String)' on a null object reference
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:147)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:103)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask.call(AsyncTask.java:288)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.FutureTask.run(FutureTask.java:237)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:231)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
02-14 23:50:17.494 2469-2530/gimbi.edu.ba W/System.err﹕ at java.lang.Thread.run(Thread.java:818)
02-14 23:50:46.713 2469-4859/gimbi.edu.ba W/System.err﹕ java.lang.NullPointerException: Attempt to invoke virtual method 'org.jsoup.select.Elements org.jsoup.nodes.Element.select(java.lang.String)' on a null object reference
02-14 23:50:46.718 2469-4859/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:147)
02-14 23:50:46.718 2469-4859/gimbi.edu.ba W/System.err﹕ at gimbi.edu.ba.PocetnaFragment$JsoupListView.doInBackground(PocetnaFragment.java:103)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask.call(AsyncTask.java:288)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.FutureTask.run(FutureTask.java:237)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:231)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
02-14 23:50:46.719 2469-4859/gimbi.edu.ba W/System.err﹕ at java.lang.Thread.run(Thread.java:818)
正如我在下面所说的,我尝试删除 img element
,但是当我调用 map.put
方法时,所有元素都相同。
阅读此link,了解如何使用 Jsoup 提取数据。
以下是我根据您的场景举例。
Document doc = null;
Element aEle = null;
Element fontEle = null;
try {
doc = ......
/** Get A tag that is under DIV with classname right_naslov **/
aEle = doc.select("div.right_naslov > a").first();
if (aEle != null) {
System.out.println("right_naslov content: " + aEle.ownText());
}
/** Get Font tag with [classname=nadnaslov] under span[classname=right_post_nadnaslov] under div[lassname=right_post] **/
/** Try to get Font[classname=naslov] with the following method **/
fontEle = doc.select("div.right_post > span.right_post_nadnaslov > font.nadnaslov").first();
if (fontEle != null) {
System.out.println("font nadnaslov content: " + fontEle.ownText());
}
/** Get A tag that is under div[classname=right_post_tekst] under div[classname=right_post] **/
aEle = doc.select("div.right_post > div.right_post_tekst > a").first();
if (aEle != null) {
System.out.println("a href: " + aEle.attr("href"));
/** Get inner IMG tag with classname as 'novine_slika_thumbm' **/
Element imgEle = aEle.select("img.novine_slika_thumbm").first();
if (imgEle != null) {
System.out.println("img src: " + imgEle.attr("src"));
}
}
} catch (Exception e) {
e.printStackTrace();
}
以上示例仅在您正在解析的 HTML 文档中只有一个 DIV[classname=right_naslov]
或 DIV[classname=right_post]
时才有效,因为我在提取数据时使用 Elements.first()
,这意味着我总是 select 第一个符合我们提取标准的元素。尝试使用 Jsoup,玩得开心。获得所有数据后,根据需要将它们存储在 Hashmap
或 ArrayList
中。
已更新
你可以做的是 select 多个 DIV[classname=right_post] 和 Document.select()
,你 returns 对象 Elements
。然后循环每个Element
得到它的内部数据。在我的以下示例中,您将在 arraylist
变量中获得两个 HashMap
项。
有2div[classname=right_naslov],我只检索了<!-- NOVINEEE -->
评论部分之后的第二个。有 5 div[classname=right_post] 我已经忽略了那些没有内部元素 span[classname=right_post_nadnaslov].
List<HashMap<String, String>> arraylist = new ArrayList<HashMap<String, String>>();
Elements aEles = null;
Elements divRightPostEles = null;
String rightNaslov = null;
Document doc = null;
try {
doc = Jsoup.connect(url).get();
/** Get A tag that is under DIV with classname right_naslov **/
aEles = doc.select("div.right_naslov > a");
if (aEles != null && aEles.size() > 0) {
if (aEles.size() == 2)
rightNaslov = aEles.get(1).ownText();
else
rightNaslov = aEles.first().ownText();
}
/**
* Since you say there are multiple DIV with right_post as
* classname, we will get all those right post elements and loop
* them one by one to retrieve its inner elements
**/
divRightPostEles = doc.select("div.right_post");
for (Element rightPostDiv : divRightPostEles) {
/** Each loop of this represents a right_post DIV element **/
HashMap<String, String> map = new HashMap<String, String>();
/**
* Get Font tag with [classname=nadnaslov] under
* span[classname=right_post_nadnaslov] under
* div[lassname=right_post]
**/
/** Try to get Font[classname=naslov] with the following method **/
Elements fontNadnaslov = rightPostDiv
.select("span.right_post_nadnaslov > font.nadnaslov");
/**
* Get A tag that is under div[classname=right_post_tekst] under
* div[classname=right_post]
**/
Element aRightPostTekst = rightPostDiv.select(
"div.right_post_tekst > a[href]").first();
// Retrive Jsoup Elements
if (fontNadnaslov != null && fontNadnaslov.size() > 0) {
map.put("country", fontNadnaslov.first().ownText());
if (aRightPostTekst != null) {
map.put("population", aRightPostTekst.attr("href"));
Element img = aRightPostTekst.select("img[src]").first();
if (img != null)
map.put("image", img.attr("src"));
}
if (rightNaslov != null)
map.put("rank", rightNaslov);
// Set all extracted Jsoup Elements into the array
arraylist.add(map);
}
}
} catch (Exception e) {
e.printStackTrace();
}