如何使用美丽的汤在 XHTML 中提取没有样式键的内联 CSS 样式
how to extract inline CSS style without style key in XHTML using beautiful soup
<p height="1em" width="0" align="justify">“<span><i>
There’s the feather bed element here brother, ach! and not only that! There’s an attraction here—here you have the end of the world, an anchorage, a quiet haven, the navel of the earth, the three fishes that are the foundation of the world, the essence of pancakes, of savoury fish-pies, of the evening samovar, of soft sighs and warm shawls, and hot stoves to sleep on—as snug as though you were dead, and yet you’re alive—the advantages of both at once.
</i></span>”</p>
对于 bs4,我想使用 p["style"] 来提取 height="1em" width="0" align="justify" 信息,但是它会抛出一个关键错误。如何像这样解析内联 css 样式?
正确:实际上这个 html 句子没有 css 内联样式,只是旧样式属性。所以只需遍历属性
使用 BeautifulSoup soup 和 css 选择器。
from bs4 import BeautifulSoup
html='''<p height="1em" width="0" align="justify"><span><i>
There’s the feather bed element here brother, ach! and not only that! There’s an attraction here—here you have the end of the world, an anchorage, a quiet haven, the navel of the earth, the three fishes that are the foundation of the world, the essence of pancakes, of savoury fish-pies, of the evening samovar, of soft sighs and warm shawls, and hot stoves to sleep on—as snug as though you were dead, and yet you’re alive—the advantages of both at once.
</i></span></p>'''
soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('p[height]')['height'])
print(soup.select_one('p[width]')['width'])
print(soup.select_one('p[align]')['align'])
输出:
1em
0
justify
<p height="1em" width="0" align="justify">“<span><i>
There’s the feather bed element here brother, ach! and not only that! There’s an attraction here—here you have the end of the world, an anchorage, a quiet haven, the navel of the earth, the three fishes that are the foundation of the world, the essence of pancakes, of savoury fish-pies, of the evening samovar, of soft sighs and warm shawls, and hot stoves to sleep on—as snug as though you were dead, and yet you’re alive—the advantages of both at once.
</i></span>”</p>
对于 bs4,我想使用 p["style"] 来提取 height="1em" width="0" align="justify" 信息,但是它会抛出一个关键错误。如何像这样解析内联 css 样式?
正确:实际上这个 html 句子没有 css 内联样式,只是旧样式属性。所以只需遍历属性
使用 BeautifulSoup soup 和 css 选择器。
from bs4 import BeautifulSoup
html='''<p height="1em" width="0" align="justify"><span><i>
There’s the feather bed element here brother, ach! and not only that! There’s an attraction here—here you have the end of the world, an anchorage, a quiet haven, the navel of the earth, the three fishes that are the foundation of the world, the essence of pancakes, of savoury fish-pies, of the evening samovar, of soft sighs and warm shawls, and hot stoves to sleep on—as snug as though you were dead, and yet you’re alive—the advantages of both at once.
</i></span></p>'''
soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('p[height]')['height'])
print(soup.select_one('p[width]')['width'])
print(soup.select_one('p[align]')['align'])
输出:
1em
0
justify