Python Beautiful Soup 4 使用 .select() 获取 Children 个元素
Python Beautiful Soup 4 Get Children of Element with .select()
.select() 元素允许我根据 css selector 从网页中获取元素,但这将搜索整个网页。我将如何使用 .select() 但仅搜索特定元素的 children 。例如:
<!-- Simplified example of the structure -->
<ul>
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
<li>
<!-- We can't assume that foo, bar, and baz will always be there -->
<div class="foo">foo content</div>
<div class="baz">baz content</div>
</li>
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
</ul>
我想要一种表达方式:
对于 <li>
[0] foo 包含值 "foo content"
,bar 包含值 "bar content"
等。
目前我的解决方案如下:
foos = soup.select("div.foo")
bars = soup.select("div.bar")
bazs = soup.select("div.baz")
for i in range(len(foos)):
print("{i} contains: {} and {} and {}".format(i=i, foos[i], bars[i], bazs[i]))
这在大多数情况下都有效。但是当其中一个 li 中缺少一个元素时,它就完全崩溃了。就像我在 HTML 中展示的那样,我们不能假设 bar、baz 和 foo 三个元素会出现。
因此,我将如何只搜索 children 个 lis。因此我可以做这样的事情:
for i in soup.select("li"):
#how would i do this:
foo = child_of("li", "div.foo")????
bar = child_of("li", "div.bar")????
baz = child_of("li", "div.baz")????
您可以像这样使用 element:nth-of-type(n)
:
from bs4 import BeautifulSoup
a = """<!-- Simplified example of the structure -->
<ul>
<li>
<div class="foo">foo1 content</div>
<div class="bar">bar1 content</div>
<div class="baz">baz1 content</div>
</li>
<li>
<!-- We can't assume that foo, bar, and baz will always be there -->
<div class="foo">foo2 content</div>
<div class="baz">baz2 content</div>
</li>
<li>
<div class="foo">foo3 content</div>
<div class="bar">bar3 content</div>
<div class="baz">baz3 content</div>
</li>
</ul>
"""
s = BeautifulSoup(a)
s2 = s.select('ul > li:nth-of-type(2)')[0]
foo, bar, baz = s2.select('div.foo'), s2.select('div.bar'), s2.select('div.baz')
print foo, bar, baz
输出:
[<div class="foo">foo2 content</div>] [] [<div class="baz">baz2 content</div>]
for li in soup.select('li'):
foo = li.select('.foo')
bar = li.select('.bar')
baz = li.select('.baz')
每次您遍历 li
标签并使用 select()
时,要 selected 的 html 代码只是 li 标签的内容,例如:
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
因此,您可以使用 select()
到 select li 的 child,因为 li 仅包含 child 标签。
这对我有用,所有食物、酒吧和基地都存储在单独的列表中
foos = []
bars = []
bazs = []
for i in soup.find_all('li'):
soup2 = BeautifulSoup(str(i))
print soup2
for _ in soup2.find_all('div', {'class':'foo'}):
foos.append(_)
for _ in soup2.find_all('div', {'class': 'bar'}):
bars.append(_)
for _ in soup2.find_all('div', {'class': 'baz'}):
bazs.append(_)
.select() 元素允许我根据 css selector 从网页中获取元素,但这将搜索整个网页。我将如何使用 .select() 但仅搜索特定元素的 children 。例如:
<!-- Simplified example of the structure -->
<ul>
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
<li>
<!-- We can't assume that foo, bar, and baz will always be there -->
<div class="foo">foo content</div>
<div class="baz">baz content</div>
</li>
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
</ul>
我想要一种表达方式:
对于 <li>
[0] foo 包含值 "foo content"
,bar 包含值 "bar content"
等。
目前我的解决方案如下:
foos = soup.select("div.foo")
bars = soup.select("div.bar")
bazs = soup.select("div.baz")
for i in range(len(foos)):
print("{i} contains: {} and {} and {}".format(i=i, foos[i], bars[i], bazs[i]))
这在大多数情况下都有效。但是当其中一个 li 中缺少一个元素时,它就完全崩溃了。就像我在 HTML 中展示的那样,我们不能假设 bar、baz 和 foo 三个元素会出现。
因此,我将如何只搜索 children 个 lis。因此我可以做这样的事情:
for i in soup.select("li"):
#how would i do this:
foo = child_of("li", "div.foo")????
bar = child_of("li", "div.bar")????
baz = child_of("li", "div.baz")????
您可以像这样使用 element:nth-of-type(n)
:
from bs4 import BeautifulSoup
a = """<!-- Simplified example of the structure -->
<ul>
<li>
<div class="foo">foo1 content</div>
<div class="bar">bar1 content</div>
<div class="baz">baz1 content</div>
</li>
<li>
<!-- We can't assume that foo, bar, and baz will always be there -->
<div class="foo">foo2 content</div>
<div class="baz">baz2 content</div>
</li>
<li>
<div class="foo">foo3 content</div>
<div class="bar">bar3 content</div>
<div class="baz">baz3 content</div>
</li>
</ul>
"""
s = BeautifulSoup(a)
s2 = s.select('ul > li:nth-of-type(2)')[0]
foo, bar, baz = s2.select('div.foo'), s2.select('div.bar'), s2.select('div.baz')
print foo, bar, baz
输出:
[<div class="foo">foo2 content</div>] [] [<div class="baz">baz2 content</div>]
for li in soup.select('li'):
foo = li.select('.foo')
bar = li.select('.bar')
baz = li.select('.baz')
每次您遍历 li
标签并使用 select()
时,要 selected 的 html 代码只是 li 标签的内容,例如:
<li>
<div class="foo">foo content</div>
<div class="bar">bar content</div>
<div class="baz">baz content</div>
</li>
因此,您可以使用 select()
到 select li 的 child,因为 li 仅包含 child 标签。
这对我有用,所有食物、酒吧和基地都存储在单独的列表中
foos = []
bars = []
bazs = []
for i in soup.find_all('li'):
soup2 = BeautifulSoup(str(i))
print soup2
for _ in soup2.find_all('div', {'class':'foo'}):
foos.append(_)
for _ in soup2.find_all('div', {'class': 'bar'}):
bars.append(_)
for _ in soup2.find_all('div', {'class': 'baz'}):
bazs.append(_)