如何从 BeautifulSoup 中的 table ( Python ) 中获取第一个 child table 行
How to get first child table row from a table in BeautifulSoup ( Python )
这是代码和示例结果,我只想要 table 的第一列忽略其余部分。 Whosebug 上有类似的问题,但他们没有帮助。
<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>
我只想要 JOHNSON,因为它是第一个 child。
我的 python 代码是:
import requests
from bs4 import BeautifulSoup
def find_raw():
url = 'http://names.mongabay.com/most_common_surnames.htm'
r = requests.get(url)
html = r.content
soup = BeautifulSoup(html)
for n in soup.find_all('tr'):
print n.text
find_raw()
我得到的:
SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812
您可以找到所有带有 find_all
的 tr
标签,然后对于每个 tr
您 find
(只给出第一个)td
。如果存在,则打印它:
for tr in soup.find_all('tr'):
td = tr.find('td')
if td:
print td
遍历 tr,然后打印第一个 td 的文本:
for tr in bs4.BeautifulSoup(data).select('tr'):
try:
print tr.select('td')[0].text
except:
pass
或更短:
>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]
相关帖子:
- Is there a clean way to get the n-th column of an html table using BeautifulSoup?
- Extracting selected columns from a table using BeautifulSoup
- CSS select with beautifulsoup4 doesn't work
- Python BeautifulSoup Getting a column from table - IndexError List index out of range
- BeautifulSoup Specify table column by number?
这是代码和示例结果,我只想要 table 的第一列忽略其余部分。 Whosebug 上有类似的问题,但他们没有帮助。
<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>
我只想要 JOHNSON,因为它是第一个 child。 我的 python 代码是:
import requests
from bs4 import BeautifulSoup
def find_raw():
url = 'http://names.mongabay.com/most_common_surnames.htm'
r = requests.get(url)
html = r.content
soup = BeautifulSoup(html)
for n in soup.find_all('tr'):
print n.text
find_raw()
我得到的:
SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812
您可以找到所有带有 find_all
的 tr
标签,然后对于每个 tr
您 find
(只给出第一个)td
。如果存在,则打印它:
for tr in soup.find_all('tr'):
td = tr.find('td')
if td:
print td
遍历 tr,然后打印第一个 td 的文本:
for tr in bs4.BeautifulSoup(data).select('tr'):
try:
print tr.select('td')[0].text
except:
pass
或更短:
>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]
相关帖子:
- Is there a clean way to get the n-th column of an html table using BeautifulSoup?
- Extracting selected columns from a table using BeautifulSoup
- CSS select with beautifulsoup4 doesn't work
- Python BeautifulSoup Getting a column from table - IndexError List index out of range
- BeautifulSoup Specify table column by number?