使用 beautifulsoup 无法在许多 div 层中获取子层内容
can't get sub-layer content within many div layer by using beautifulsoap
我想访问 div 中 class 名称为“ar-faculty-section-content”的内容 https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/。我尝试使用下面的方法来获取目标内容。但是没用。
profile = requests.get("https://www.fed.cuhk.edu.hk/cri/faculty/dr-sze-man-man-paul/")
x = BeautifulSoup(profile.text,"html.parser")
x.find_all("h5", { "class" : "ar-faculty-section-content" })
结果如下
[<div class="ar-faculty-section-content" style="font-weight: 400 !important">
]
如何获取div部分的全部内容,例如h5 li......?
方法如下:
import requests
from bs4 import BeautifulSoup
url = "https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/"
source_html = requests.get(url).text
soup = BeautifulSoup(source_html, 'lxml')
h5 = soup.select_one(".ar-faculty-section-content h5").getText()
li_elements = [li.getText() for li in soup.select(".ar-faculty-section-content li")]
print(h5)
print("\n".join(li_elements))
输出:
Introduction
Yin, H., & Huang, S. (2021). Applying structural equation modelling to research on teaching and teacher education: Looking back and forward. Teaching and Teacher Education. DOI: 10.1016/j.tate.2021.103438
Yin, H., & Shi, L. (2021). Which type of interpersonal interaction better facilitates college student learning and development in China: Face-to-face or online? ECNU Review of Education. DOI: 10.1177/20965311211010818
and a lot more ...
我想访问 div 中 class 名称为“ar-faculty-section-content”的内容 https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/。我尝试使用下面的方法来获取目标内容。但是没用。
profile = requests.get("https://www.fed.cuhk.edu.hk/cri/faculty/dr-sze-man-man-paul/")
x = BeautifulSoup(profile.text,"html.parser")
x.find_all("h5", { "class" : "ar-faculty-section-content" })
结果如下
[<div class="ar-faculty-section-content" style="font-weight: 400 !important">
]
如何获取div部分的全部内容,例如h5 li......?
方法如下:
import requests
from bs4 import BeautifulSoup
url = "https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/"
source_html = requests.get(url).text
soup = BeautifulSoup(source_html, 'lxml')
h5 = soup.select_one(".ar-faculty-section-content h5").getText()
li_elements = [li.getText() for li in soup.select(".ar-faculty-section-content li")]
print(h5)
print("\n".join(li_elements))
输出:
Introduction
Yin, H., & Huang, S. (2021). Applying structural equation modelling to research on teaching and teacher education: Looking back and forward. Teaching and Teacher Education. DOI: 10.1016/j.tate.2021.103438
Yin, H., & Shi, L. (2021). Which type of interpersonal interaction better facilitates college student learning and development in China: Face-to-face or online? ECNU Review of Education. DOI: 10.1177/20965311211010818
and a lot more ...