如何从 beautifulsoup 中的 table (td) 中提取 ;a' 标签的标题

Question

这是我的 beautifulsoup 代码。

  topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
  response = requests.get(topics_url)
  page_content = response.text
  doc = BeautifulSoup(page_content, 'html.parser')
  table_title = doc.find_all('td', {'class' : 'field title'})
  table_title[:5]

输出

获得 td 后，我想访问 'a' 标签并从 'a' 标签中提取标题。由于没有 class 或 id 我如何从 'a' 标签获取标题？

期望输出：

两个城市的故事

智人：人类简史

和所有....

Answer 1

它们是title 的属性值。因此，您可以调用 .get('title') 来获取该数据点

import requests
from bs4 import BeautifulSoup

topics_url = 'https://www.goodreads.com/review/list/47437459?page=1&ref=nav_mybooks'
response = requests.get(topics_url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
for table_title in doc.select('#books tbody tr'):
  t= table_title.select_one('td.field.title div a').get('title')
  print(t)

输出：

A Tale of Two Cities
Sapiens: A Brief History of Humankind 
Wings of Fire: An Autobiography       
Maktub
Mindset: The New Psychology of Success
The Travels of Ibn Battutah
After Dark
Norwegian Wood
Never Let Me Go
Why We Sleep: Unlocking the Power of Sleep and Dreams
Uttaradhikar
Behind the Beautiful Forevers: Life, Death, and Hope in a Mumbai Undercity
Cloud Atlas
Hillbilly Elegy: A Memoir of a Family and Culture in Crisis
Outliers: The Story of Success
The Black Swan: The Impact of the Highly Improbable

如何从 beautifulsoup 中的 table (td) 中提取 ;a' 标签的标题

how to extract title of ;a' tag from a table (td) in beautifulsoup

beautifulsoup