我无法使用 beautifulsoup python 获取 HTML 标签的值
I can't get a value of HTML tag using beautifulsoup python
嘿,我正在尝试抓取一个网站,输入中的某些值不会作为文本抓取
只有 HTML
像这样
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
所以我想做的就是获得价值 (John Doe)
我试过 put.text 但它没有抓取它
这是代码
soup=BeautifulSoup(r.content,'lxml')
for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
with io.open('x.txt', 'w', encoding="utf-8") as f:
f.write (name.prettify())
调用 .text
时没有得到结果的原因是“John Doe”不在 HTML 的文本中,它是一个 HTML 属性: value="John Doe"
.
您可以使用 tag[<attribute>]
像访问 Python 字典 (dict
) 一样访问属性。 (参见BeautifulSoup documentation on attributes)。
html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""
soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
print(name["value"])
输出:
John Doe
虽然 的答案很有效,但如果不使用 for
循环可能会更简洁(如果您只想提取一个元素):
>>> soup.find('input')['value']
# John Doe
代码:
from bs4 import BeautifulSoup
string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''
soup = BeautifulSoup(string, 'html.parser')
john_come_here = soup.find('input')['value']
print(john_come_here)
# John Doe
嘿,我正在尝试抓取一个网站,输入中的某些值不会作为文本抓取 只有 HTML 像这样
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
所以我想做的就是获得价值 (John Doe) 我试过 put.text 但它没有抓取它 这是代码
soup=BeautifulSoup(r.content,'lxml')
for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
with io.open('x.txt', 'w', encoding="utf-8") as f:
f.write (name.prettify())
调用 .text
时没有得到结果的原因是“John Doe”不在 HTML 的文本中,它是一个 HTML 属性: value="John Doe"
.
您可以使用 tag[<attribute>]
像访问 Python 字典 (dict
) 一样访问属性。 (参见BeautifulSoup documentation on attributes)。
html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""
soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
print(name["value"])
输出:
John Doe
虽然 for
循环可能会更简洁(如果您只想提取一个元素):
>>> soup.find('input')['value']
# John Doe
代码:
from bs4 import BeautifulSoup
string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''
soup = BeautifulSoup(string, 'html.parser')
john_come_here = soup.find('input')['value']
print(john_come_here)
# John Doe