如何从网站上抓取以“0”结尾的号码?
How to grab the numebr with the end of "0" from the website?
我使用 BeaustifulSoup 在 url"https://nature.altmetric.com/details/114136890" 上抓取一些文本并得到这样的响应
# The table is called twitterGeographical_TableChoice
<table>
<tr>
<th>Country</th>
<th class="num">Count</th>
<th class="num percent">As %</th>
</tr>
<tr>
<td>Japan</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Poland</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Spain</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>El Salvador</td>
<td class="num">2</td>
<td class="num">8%</td>
</tr>
<tr>
<td>Ecuador</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Mexico</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Chile</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>India</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr class="meta">
<td>Unknown</td>
<td class="num">10</td>
<td class="num">40%</td>
</tr>
</table>
然后我想从it.I中获取数字,使用正则表达式来获取它。
我的格式是
twitterGeographical_Table_Num_pattern = re.compile('<td class=\"num\">(\d*%)</td>',re.S)
twitterGeographical_Table_Num = twitterGeographical_Table_Num_pattern.findall(twitterGeographical_TableChoice)
但是我只能得到4%,而不是40%。我puzzled.Thanks求助!
我不确定为什么 BeautifulSoup 已经有很多方法可以使用正则表达式模块获取数字。无论如何,如果你对正则表达式感兴趣,你可以改用这个模式:
<td class=\"num\">((\d+)(%)?)</td>
然后您可以使用以下代码获取数字(百分比,如果是的话):
[x[0] for x in twitterGeographical_Table_Num]
输出
['10', '40%']
旁注:请您考虑将变量命名得更短更清楚!:)
我使用 BeaustifulSoup 在 url"https://nature.altmetric.com/details/114136890" 上抓取一些文本并得到这样的响应
# The table is called twitterGeographical_TableChoice
<table>
<tr>
<th>Country</th>
<th class="num">Count</th>
<th class="num percent">As %</th>
</tr>
<tr>
<td>Japan</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Poland</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Spain</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>El Salvador</td>
<td class="num">2</td>
<td class="num">8%</td>
</tr>
<tr>
<td>Ecuador</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Mexico</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Chile</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>India</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr class="meta">
<td>Unknown</td>
<td class="num">10</td>
<td class="num">40%</td>
</tr>
</table>
然后我想从it.I中获取数字,使用正则表达式来获取它。 我的格式是
twitterGeographical_Table_Num_pattern = re.compile('<td class=\"num\">(\d*%)</td>',re.S)
twitterGeographical_Table_Num = twitterGeographical_Table_Num_pattern.findall(twitterGeographical_TableChoice)
但是我只能得到4%,而不是40%。我puzzled.Thanks求助!
我不确定为什么 BeautifulSoup 已经有很多方法可以使用正则表达式模块获取数字。无论如何,如果你对正则表达式感兴趣,你可以改用这个模式:
<td class=\"num\">((\d+)(%)?)</td>
然后您可以使用以下代码获取数字(百分比,如果是的话):
[x[0] for x in twitterGeographical_Table_Num]
输出
['10', '40%']
旁注:请您考虑将变量命名得更短更清楚!:)