漂亮的汤找到具有已知属性的已知对象的第一个兄弟姐妹

Beautiful soup finding the first sibling of a known object with a known attribute

我有以下代码 select table 元素中的某个单元格:

tag = soup.find_all('td', attrs={'class': 'I'})

如附图 1 所示,我想以某种方式能够在同一个 class "even_row" 中找到它的第一个兄弟。理想情况下,selection 将仅输出数据秒的内容,在本例中为“58”。不是每个 "even_row" class 都有一个带有 class I 的元素,有些元素不止一个,所以我只需要为 "even_row" [= 获取值 data-seconds 30=] 具有 class "I"

元素的元素

任何帮助将不胜感激,因为我一直在努力寻找文档,但无济于事。

html 看起来像:

<tr class='even_row'>
<td class='row_labels' data-seconds="58">
    <div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>

没有 html 就无法正确测试,但听起来像 bs4 4.7.1+ 你可以使用 :has 来满足你对 .even_row:has(.I) 的要求,即 parent 和 class even_row,有 child 和 class I,然后加上 [data-seconds] 来满足所有 child data-seconds 属性值

print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])

解决该问题的一种方法是传递 True

from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
    <td class='row_labels' data-seconds="58">
        <div class='celldiv slots1'></div>
    </td>
    <td class='new'>...</td>
    <td class='I'>...</td>
    <td class='new'>...</td>
    <td class='new'>...</td>
</tr>
<tr class='even_row'>
    <td class='row_labels' >
        <div class='celldiv slots1'></div>
    </td>
    <td class='new'>...</td>
    <td class='I'>...</td>
    <td class='new'>...</td>
    <td class='new'>...</td>
</tr>
"""


soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
    tag = row.find("td", {"data-seconds" : True})
    if tag is  not None:
        print(tag.get('data-seconds'))

输出:

58

另一种方法是使用正则表达式

import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)

输出:

['58']