漂亮的汤找到具有已知属性的已知对象的第一个兄弟姐妹
Beautiful soup finding the first sibling of a known object with a known attribute
我有以下代码 select table 元素中的某个单元格:
tag = soup.find_all('td', attrs={'class': 'I'})
如附图 1 所示,我想以某种方式能够在同一个 class "even_row" 中找到它的第一个兄弟。理想情况下,selection 将仅输出数据秒的内容,在本例中为“58”。不是每个 "even_row" class 都有一个带有 class I 的元素,有些元素不止一个,所以我只需要为 "even_row" [= 获取值 data-seconds 30=] 具有 class "I"
元素的元素
任何帮助将不胜感激,因为我一直在努力寻找文档,但无济于事。
html 看起来像:
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
没有 html 就无法正确测试,但听起来像 bs4 4.7.1+ 你可以使用 :has
来满足你对 .even_row:has(.I)
的要求,即 parent 和 class even_row
,有 child 和 class I
,然后加上 [data-seconds]
来满足所有 child data-seconds
属性值
print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])
解决该问题的一种方法是传递 True
from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
<tr class='even_row'>
<td class='row_labels' >
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
"""
soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
tag = row.find("td", {"data-seconds" : True})
if tag is not None:
print(tag.get('data-seconds'))
输出:
58
另一种方法是使用正则表达式
import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)
输出:
['58']
我有以下代码 select table 元素中的某个单元格:
tag = soup.find_all('td', attrs={'class': 'I'})
如附图 1 所示,我想以某种方式能够在同一个 class "even_row" 中找到它的第一个兄弟。理想情况下,selection 将仅输出数据秒的内容,在本例中为“58”。不是每个 "even_row" class 都有一个带有 class I 的元素,有些元素不止一个,所以我只需要为 "even_row" [= 获取值 data-seconds 30=] 具有 class "I"
元素的元素任何帮助将不胜感激,因为我一直在努力寻找文档,但无济于事。
html 看起来像:
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
没有 html 就无法正确测试,但听起来像 bs4 4.7.1+ 你可以使用 :has
来满足你对 .even_row:has(.I)
的要求,即 parent 和 class even_row
,有 child 和 class I
,然后加上 [data-seconds]
来满足所有 child data-seconds
属性值
print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])
解决该问题的一种方法是传递 True
from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
<tr class='even_row'>
<td class='row_labels' >
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
"""
soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
tag = row.find("td", {"data-seconds" : True})
if tag is not None:
print(tag.get('data-seconds'))
输出:
58
另一种方法是使用正则表达式
import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)
输出:
['58']