为什么 select_one 在 Beautifulsoup returns None 值中起作用
Why select_one function in Beautifulsoup returns None value
我正在使用 python 3 和 Beautifulsoup
我的以下代码returns none 个值的列表
import bs4 as bs
import urllib.request
import pandas as pd
from requests_html import HTMLSession
review_dict = {'review':[], 'author':[]}
page = 1
while page != 10:
session = HTMLSession()
url = 'https://www.goodreads.com/book/show/2932708?from_search=true&from_srp=true&qid=OOQwYQkG9A&rank=1' + str(++page)
grURL = session.get(url)
soup = bs.BeautifulSoup(grURL.content, 'html.parser')
prod_containers = soup.find('div', id = 'lazy_loadable_view')
firstelement = prod_containers.find_all('div', attrs={'class': 'left bodycol'})
for rows in firstelement:
review = rows.select_one('p > div.reviewText stacked > span.readable > span')
author = rows.select_one('p > div.reviewHeader uitext stacked > span > a[title]')
review_dict['review'].append(review)
review_dict['author'].append(author)
if page == 10:
break
page += 1
sword_reviews = pd.DataFrame(review_dict)
sword_reviews
当我使用 .text
函数时,Jupiter notebook 给我这个错误:
AttributeError: 'NoneType' object has no attribute 'text'
如何调整我的代码以正确抓取评论和评论者姓名?
要获取您想要的数据,您需要更改 select_one
中的搜索字符串。加入多个 类 和 '.'
试试这个代码:
import bs4 as bs
import urllib.request
import pandas as pd
from requests_html import HTMLSession
review_dict = {'review':[], 'author':[]}
page = 1
while page != 10:
session = HTMLSession()
url = 'https://www.goodreads.com/book/show/2932708?from_search=true&from_srp=true&qid=OOQwYQkG9A&rank=1' + str(++page)
grURL = session.get(url)
soup = bs.BeautifulSoup(grURL.content, 'html.parser')
prod_containers = soup.find('div', id = 'lazy_loadable_view')
firstelement = prod_containers.find_all('div', attrs={'class': 'left bodycol'})
for rows in firstelement:
review = rows.select_one('div.reviewText.stacked > span.readable > span')
author = rows.select_one('div.reviewHeader.uitext.stacked > span > a[title]')
review_dict['review'].append(review)
review_dict['author'].append(author)
if page == 10:
break
page += 1
sword_reviews = pd.DataFrame(review_dict)
print(sword_reviews)
我正在使用 python 3 和 Beautifulsoup
我的以下代码returns none 个值的列表
import bs4 as bs
import urllib.request
import pandas as pd
from requests_html import HTMLSession
review_dict = {'review':[], 'author':[]}
page = 1
while page != 10:
session = HTMLSession()
url = 'https://www.goodreads.com/book/show/2932708?from_search=true&from_srp=true&qid=OOQwYQkG9A&rank=1' + str(++page)
grURL = session.get(url)
soup = bs.BeautifulSoup(grURL.content, 'html.parser')
prod_containers = soup.find('div', id = 'lazy_loadable_view')
firstelement = prod_containers.find_all('div', attrs={'class': 'left bodycol'})
for rows in firstelement:
review = rows.select_one('p > div.reviewText stacked > span.readable > span')
author = rows.select_one('p > div.reviewHeader uitext stacked > span > a[title]')
review_dict['review'].append(review)
review_dict['author'].append(author)
if page == 10:
break
page += 1
sword_reviews = pd.DataFrame(review_dict)
sword_reviews
当我使用 .text
函数时,Jupiter notebook 给我这个错误:
AttributeError: 'NoneType' object has no attribute 'text'
如何调整我的代码以正确抓取评论和评论者姓名?
要获取您想要的数据,您需要更改 select_one
中的搜索字符串。加入多个 类 和 '.'
试试这个代码:
import bs4 as bs
import urllib.request
import pandas as pd
from requests_html import HTMLSession
review_dict = {'review':[], 'author':[]}
page = 1
while page != 10:
session = HTMLSession()
url = 'https://www.goodreads.com/book/show/2932708?from_search=true&from_srp=true&qid=OOQwYQkG9A&rank=1' + str(++page)
grURL = session.get(url)
soup = bs.BeautifulSoup(grURL.content, 'html.parser')
prod_containers = soup.find('div', id = 'lazy_loadable_view')
firstelement = prod_containers.find_all('div', attrs={'class': 'left bodycol'})
for rows in firstelement:
review = rows.select_one('div.reviewText.stacked > span.readable > span')
author = rows.select_one('div.reviewHeader.uitext.stacked > span > a[title]')
review_dict['review'].append(review)
review_dict['author'].append(author)
if page == 10:
break
page += 1
sword_reviews = pd.DataFrame(review_dict)
print(sword_reviews)