如何使用 Python 从 DIV 内的 <a> 标签中提取标题？

Question

我是 Python 的新手，我想提取放置在 Divs 中的标签内的所有 title/s。它可以有 0 个标题或多达 100 个。

是childDIV<div class="Shl zI7 iyn Hsu">里面有标签和title

这是包含所有 child DIV 的第一个 Main DIV 代码:

<div class="Eqh F6l Jea k1A zI7 iyn Hsu"><div class="Shl zI7 iyn Hsu"><a data-test-id="search-guide" 
href="" title="Search for &quot;living room colors&quot;"><div class="Jea Lfz XiG fZz gjz qDf zI7 iyn 
Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);"><div class="tBJ dyH iFc MF7 
erh tg7 IZT mWe">Living</div></div></a>

在上面的例子中，我想获取“客厅颜色”而不是title前面的所有内容=，我想我以后可能会有一些正则表达式，但我有从 HTML 解析中获取标题的问题。

我试过以下Python:

import requests
from bs4 import BeautifulSoup

url = "https://www.pinterest.com/search/pins/?q=room%20color"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(get_text, "html.parser")
DivTitle = soup.select('a.Shl.zI7.iyn.Hsu')[0].text.strip()
print(DivTitle)

我得到：IndexError：列表索引超出范围

当我搜索上面的关键字时，搜索结果中出现了不止一个标题（建议关键字）。

感谢您的帮助。

已编辑：好的，我成功了，但我试图让它从 URL 解析而不是粘贴我的代码：

这是我使用的部分：

import requests
vgm_url = 'https://www.pinterest.com/search/pins/?q=skin%20care'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')

但我什么也没得到，也没有错误。

Answer 1

您的选择器是错误的，因为 DIV 具有您想要的类，而 A 是 DIV 的子项。 title 是 A 元素的属性。

from bs4 import BeautifulSoup

data = '''\
<html>
  <head>
    <meta name="generator"
    content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy-html5/tree/c63cc39" />
    <title></title>
  </head>
  <body>
    <div class="Eqh F6l Jea k1A zI7 iyn Hsu">
      <div class="Shl zI7 iyn Hsu">
        <a data-test-id="search-guide" href="" title="Search for &quot;living room colors&quot;">
          <div class="Jea Lfz XiG fZz gjz qDf zI7 iyn Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);">
            <div class="tBJ dyH iFc MF7 erh tg7 IZT mWe">Living</div>
          </div>
        </a>
      </div>
    </div>
  </body>
</html>
'''

soup = BeautifulSoup(data, 'html.parser')

a = soup.select('div.Shl.zI7.iyn.Hsu a')[0]

print(a['title'])

如何使用 Python 从 DIV 内的 <a> 标签中提取标题？

How to Extract title from <a> tag within DIV using Python?

html

python

title

html-parsing