如何在 python bs4 中使用 xpath 获取字符串?
How to get string using xpath in python bs4?
我需要使用 python 和 bs4 将字符串放入 li 标签中。我正在尝试使用以下代码:
from bs4 import BeautifulSoup
from lxml import etree
html_doc = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'lxml')
dom = etree.HTML(str(soup))
print (dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]'))
那个returns:
[<0x7fc640e896c0 处的元素 li>]
但想要的结果是如下所示的 li 标签文本:
状态:有效
怎么办?
谢谢
在 xpath 中你只需要使用 text()
方法
from bs4 import BeautifulSoup
from lxml import etree
html_doc = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'lxml')
dom = etree.HTML(str(soup))
print(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text())
输出:
['Status: Active']
#或
for li in dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()'):
txt=li.split()[1]
print(txt)
输出:
Active
#或
print(' '.join(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()')))
输出:
Status: Active
#或
print(''.join(dom.xpath('//*[@class="box"]/ul/li[3]/text()')))
输出:
Status: Active
试试下面的方法(不需要外部库)
import xml.etree.ElementTree as ET
xml = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
root = ET.fromstring(xml)
print(root.find('.//ul')[-1].text)
输出
Status: Active
我需要使用 python 和 bs4 将字符串放入 li 标签中。我正在尝试使用以下代码:
from bs4 import BeautifulSoup
from lxml import etree
html_doc = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'lxml')
dom = etree.HTML(str(soup))
print (dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]'))
那个returns: [<0x7fc640e896c0 处的元素 li>]
但想要的结果是如下所示的 li 标签文本: 状态:有效
怎么办? 谢谢
在 xpath 中你只需要使用 text()
方法
from bs4 import BeautifulSoup
from lxml import etree
html_doc = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'lxml')
dom = etree.HTML(str(soup))
print(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text())
输出:
['Status: Active']
#或
for li in dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()'):
txt=li.split()[1]
print(txt)
输出:
Active
#或
print(' '.join(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()')))
输出:
Status: Active
#或
print(''.join(dom.xpath('//*[@class="box"]/ul/li[3]/text()')))
输出:
Status: Active
试试下面的方法(不需要外部库)
import xml.etree.ElementTree as ET
xml = """
<html>
<head>
</head>
<body>
<div class="container">
<section id="page">
<div class="content">
<div class="box">
<ul>
<li>Name: Peter</li>
<li>Age: 21</li>
<li>Status: Active</li>
</ul>
</div>
</div>
</section>
</div>
</body>
</html>
"""
root = ET.fromstring(xml)
print(root.find('.//ul')[-1].text)
输出
Status: Active