如何在 div 中解析 table
How to parse through table inside div
<div id="findet_1" name="findet_1" >
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td class="thc01 w160 gL_10 UC" > Standalone</td>
<td class="thc01 w160 gL_10 tar">Jun'16</td>
<td class="thc01 w160 gL_10 tar">Mar'16</td>
<td class="thc01 w160 gL_10 tar">Dec'15</td>
<td class="thc01 w160 gL_10 tar"><div class="PR20">Sep'15</div></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Net Sales</td>
<td class="thc02 w160 gD_12 tar">16,339.70</td>
<td class="thc02 w160 gD_12 tar">15,589.40</td>
<td class="thc02 w160 gD_12 tar">15,065.00</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">14,824.50</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Other Income</td>
<td class="thc02 w160 gD_12 tar">50.10</td>
<td class="thc02 w160 gD_12 tar">46.30</td>
<td class="thc02 w160 gD_12 tar">153.30</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">1,087.40</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >PBDIT</td>
<td class="thc02 w160 gD_12 tar">6,612.30</td>
<td class="thc02 w160 gD_12 tar">5,930.60</td>
<td class="thc02 w160 gD_12 tar">5,543.30</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">5,416.80</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Net Profit</td>
<td class="thc02 w160 gD_12 tar">1,427.50</td>
<td class="thc02 w160 gD_12 tar">1,693.90</td>
<td class="thc02 w160 gD_12 tar">1,709.10</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">2,223.70</span></td>
</tr>
</table>
</div>
我正在尝试阅读这篇文章 table。但无法这样做。我正在使用 beautyfulsoup findall 首先查找 div。 table 存在于 div 中。我找不到 table。第二个问题是遍历行的最佳方式是什么。例如,我想要 csv 格式的输出,应该用双引号括起来
像 :
"STANDALONE","Jun'16","Mar'16","Dec'15","Sep'15"
"Net Sales","16,339.70","15,589.40","15,065.00","14,824.50"
"Other Income","50.10","46.30","153.30","1,087.40"
"PBDIT","6,612.30","5,930.60","5,543.30","5,416.80"
"Net Profit","1,427.50","1,693.90","1,709.10","2,223.70"
我的代码:
从 urllib.request 导入 urlopen
从 bs4 导入 BeautifulSoup
导入重新
html = urlopen("http://www.moneycontrol.com/india/stockpricequote/computers-software/tataconsultancyservices/TCS")
bsObj = BeautifulSoup(html, "html.parser")
link = bsObj.findAll("div", id="findet_1")
table1 = link.find('table').find_all('tr')
我知道我们可以使用 get_text 获取值并使用 for 循环遍历行。但我找不到 table :(
唯一的区别是 find_all() returns 包含单个结果的 list,而 find() 只是 returns结果。
如果 find_all() 找不到任何东西,它 returns 一个空列表。如果 find() 找不到任何东西,它 returns None:
link = bsObj.findAll("div", id="findet_1")
if link:
table1 = link[0].find('table').find_all('tr')
试试这个:
table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find('table')
或这个
table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find_all('tr')
<div id="findet_1" name="findet_1" >
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td class="thc01 w160 gL_10 UC" > Standalone</td>
<td class="thc01 w160 gL_10 tar">Jun'16</td>
<td class="thc01 w160 gL_10 tar">Mar'16</td>
<td class="thc01 w160 gL_10 tar">Dec'15</td>
<td class="thc01 w160 gL_10 tar"><div class="PR20">Sep'15</div></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Net Sales</td>
<td class="thc02 w160 gD_12 tar">16,339.70</td>
<td class="thc02 w160 gD_12 tar">15,589.40</td>
<td class="thc02 w160 gD_12 tar">15,065.00</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">14,824.50</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Other Income</td>
<td class="thc02 w160 gD_12 tar">50.10</td>
<td class="thc02 w160 gD_12 tar">46.30</td>
<td class="thc02 w160 gD_12 tar">153.30</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">1,087.40</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >PBDIT</td>
<td class="thc02 w160 gD_12 tar">6,612.30</td>
<td class="thc02 w160 gD_12 tar">5,930.60</td>
<td class="thc02 w160 gD_12 tar">5,543.30</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">5,416.80</span></td>
</tr>
<tr>
<td class="thc02 w160 gD_12" >Net Profit</td>
<td class="thc02 w160 gD_12 tar">1,427.50</td>
<td class="thc02 w160 gD_12 tar">1,693.90</td>
<td class="thc02 w160 gD_12 tar">1,709.10</td>
<td class="thc02 w160 gD_12 tar"><span class="PR20">2,223.70</span></td>
</tr>
</table>
</div>
我正在尝试阅读这篇文章 table。但无法这样做。我正在使用 beautyfulsoup findall 首先查找 div。 table 存在于 div 中。我找不到 table。第二个问题是遍历行的最佳方式是什么。例如,我想要 csv 格式的输出,应该用双引号括起来 像 : "STANDALONE","Jun'16","Mar'16","Dec'15","Sep'15" "Net Sales","16,339.70","15,589.40","15,065.00","14,824.50" "Other Income","50.10","46.30","153.30","1,087.40" "PBDIT","6,612.30","5,930.60","5,543.30","5,416.80" "Net Profit","1,427.50","1,693.90","1,709.10","2,223.70"
我的代码:
从 urllib.request 导入 urlopen 从 bs4 导入 BeautifulSoup 导入重新
html = urlopen("http://www.moneycontrol.com/india/stockpricequote/computers-software/tataconsultancyservices/TCS")
bsObj = BeautifulSoup(html, "html.parser")
link = bsObj.findAll("div", id="findet_1")
table1 = link.find('table').find_all('tr')
我知道我们可以使用 get_text 获取值并使用 for 循环遍历行。但我找不到 table :(
唯一的区别是 find_all() returns 包含单个结果的 list,而 find() 只是 returns结果。
如果 find_all() 找不到任何东西,它 returns 一个空列表。如果 find() 找不到任何东西,它 returns None:
link = bsObj.findAll("div", id="findet_1")
if link:
table1 = link[0].find('table').find_all('tr')
试试这个:
table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find('table')
或这个
table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find_all('tr')