我无法使用 beautifulsoup python 获取 HTML 标签的值

Question

嘿，我正在尝试抓取一个网站，输入中的某些值不会作为文本抓取只有 HTML 像这样

<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>

所以我想做的就是获得价值 (John Doe) 我试过 put.text 但它没有抓取它这是代码

soup=BeautifulSoup(r.content,'lxml')
    for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
            with io.open('x.txt', 'w', encoding="utf-8") as f:
                f.write (name.prettify())

Answer 1

调用 .text 时没有得到结果的原因是“John Doe”不在 HTML 的文本中，它是一个 HTML 属性: value="John Doe".

您可以使用 tag[<attribute>] 像访问 Python 字典 (dict) 一样访问属性。（参见BeautifulSoup documentation on attributes）。

html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""

soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
    print(name["value"])

输出：

John Doe

Answer 2

虽然的答案很有效，但如果不使用 for 循环可能会更简洁（如果您只想提取一个元素):

>>> soup.find('input')['value']
# John Doe

代码：

from bs4 import BeautifulSoup

string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''

soup = BeautifulSoup(string, 'html.parser')

john_come_here = soup.find('input')['value']
print(john_come_here)

# John Doe

我无法使用 beautifulsoup python 获取 HTML 标签的值

I can't get a value of HTML tag using beautifulsoup python

python

screen-scraping

beautifulsoup

web