如何从 beautifulsoup 中的 table (td) 中提取 ;a' 标签的标题
how to extract title of ;a' tag from a table (td) in beautifulsoup
这是我的 beautifulsoup 代码。
topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
response = requests.get(topics_url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
table_title = doc.find_all('td', {'class' : 'field title'})
table_title[:5]
输出
获得 td 后,我想访问 'a' 标签并从 'a' 标签中提取标题。由于没有 class 或 id 我如何从 'a' 标签获取标题?
期望输出:
两个城市的故事
智人:人类简史
和所有....
它们是title
的属性值。因此,您可以调用 .get('title')
来获取该数据点
import requests
from bs4 import BeautifulSoup
topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
response = requests.get(topics_url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
for table_title in doc.select('#books tbody tr'):
t= table_title.select_one('td.field.title div a').get('title')
print(t)
输出:
A Tale of Two Cities
Sapiens: A Brief History of Humankind
Wings of Fire: An Autobiography
Maktub
Mindset: The New Psychology of Success
The Travels of Ibn Battutah
After Dark
Norwegian Wood
Never Let Me Go
Why We Sleep: Unlocking the Power of Sleep and Dreams
Uttaradhikar
Behind the Beautiful Forevers: Life, Death, and Hope in a Mumbai Undercity
Cloud Atlas
Hillbilly Elegy: A Memoir of a Family and Culture in Crisis
Outliers: The Story of Success
The Black Swan: The Impact of the Highly Improbable
这是我的 beautifulsoup 代码。
topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
response = requests.get(topics_url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
table_title = doc.find_all('td', {'class' : 'field title'})
table_title[:5]
输出
获得 td 后,我想访问 'a' 标签并从 'a' 标签中提取标题。由于没有 class 或 id 我如何从 'a' 标签获取标题?
期望输出:
两个城市的故事
智人:人类简史
和所有....
它们是title
的属性值。因此,您可以调用 .get('title')
来获取该数据点
import requests
from bs4 import BeautifulSoup
topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
response = requests.get(topics_url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
for table_title in doc.select('#books tbody tr'):
t= table_title.select_one('td.field.title div a').get('title')
print(t)
输出:
A Tale of Two Cities
Sapiens: A Brief History of Humankind
Wings of Fire: An Autobiography
Maktub
Mindset: The New Psychology of Success
The Travels of Ibn Battutah
After Dark
Norwegian Wood
Never Let Me Go
Why We Sleep: Unlocking the Power of Sleep and Dreams
Uttaradhikar
Behind the Beautiful Forevers: Life, Death, and Hope in a Mumbai Undercity
Cloud Atlas
Hillbilly Elegy: A Memoir of a Family and Culture in Crisis
Outliers: The Story of Success
The Black Swan: The Impact of the Highly Improbable