如何从 BeautifulSoup 中的 table ( Python ) 中获取第一个 child table 行

How to get first child table row from a table in BeautifulSoup ( Python )

这是代码和示例结果,我只想要 table 的第一列忽略其余部分。 Whosebug 上有类似的问题,但他们没有帮助。

<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>

我只想要 JOHNSON,因为它是第一个 child。 我的 python 代码是:

import requests
  from bs4 import BeautifulSoup
 def find_raw():
      url = 'http://names.mongabay.com/most_common_surnames.htm'
      r = requests.get(url)
      html = r.content
      soup = BeautifulSoup(html)
      for n in soup.find_all('tr'):
          print n.text
  
  find_raw()

我得到的:

SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812

您可以找到所有带有 find_alltr 标签,然后对于每个 trfind(只给出第一个)td。如果存在,则打印它:

for tr in soup.find_all('tr'):
    td = tr.find('td')
    if td:
        print td

遍历 tr,然后打印第一个 td 的文本:

for tr in bs4.BeautifulSoup(data).select('tr'):
    try:
        print tr.select('td')[0].text
    except:
        pass

或更短:

>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]

相关帖子:

  • Is there a clean way to get the n-th column of an html table using BeautifulSoup?
  • Extracting selected columns from a table using BeautifulSoup
  • CSS select with beautifulsoup4 doesn't work
  • Python BeautifulSoup Getting a column from table - IndexError List index out of range
  • BeautifulSoup Specify table column by number?