美丽汤中的 Scape 问题,NoneType' 对象没有属性 'find_all
Scape issue in beautiful soup, NoneType' object has no attribute 'find_all
正在尝试执行此代码以抓取下面提到的特定网站/RSS 提要
不断获得:
回溯(最近调用最后):
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在
成绩单 = [url_to_transcript(u) for u in urls]
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在
成绩单 = [url_to_transcript(u) for u in urls]
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 17 行,在 url_to_transcript 中
text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
AttributeError: 'NoneType' 对象没有属性 'find_all'
请指教
import requests
from bs4 import BeautifulSoup
import pickle
def url_to_transcript(url):
page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")
text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
print(url)
return text
范围内的转录本 URL
urls = ['http://feeds.nos.nl/nosnieuwstech',
'http://feeds.nos.nl/nosnieuwsalgemeen']
transcripts = [url_to_transcript(u) for u in urls]
返回的html与您在页面上看到的不一样。您可以使用以下内容:
import requests
from bs4 import BeautifulSoup
# import pickle
urls = ['http://feeds.nos.nl/nosnieuwstech','http://feeds.nos.nl/nosnieuwsalgemeen']
with requests.Session() as s:
for url in urls:
page = s.get(url).text
soup = BeautifulSoup(page, "lxml")
print(url)
print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
print('--'*100)
正在尝试执行此代码以抓取下面提到的特定网站/RSS 提要 不断获得:
回溯(最近调用最后):
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在 成绩单 = [url_to_transcript(u) for u in urls]
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在 成绩单 = [url_to_transcript(u) for u in urls]
文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 17 行,在 url_to_transcript 中 text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
AttributeError: 'NoneType' 对象没有属性 'find_all'
请指教
import requests
from bs4 import BeautifulSoup
import pickle
def url_to_transcript(url):
page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")
text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
print(url)
return text
范围内的转录本 URL
urls = ['http://feeds.nos.nl/nosnieuwstech',
'http://feeds.nos.nl/nosnieuwsalgemeen']
transcripts = [url_to_transcript(u) for u in urls]
返回的html与您在页面上看到的不一样。您可以使用以下内容:
import requests
from bs4 import BeautifulSoup
# import pickle
urls = ['http://feeds.nos.nl/nosnieuwstech','http://feeds.nos.nl/nosnieuwsalgemeen']
with requests.Session() as s:
for url in urls:
page = s.get(url).text
soup = BeautifulSoup(page, "lxml")
print(url)
print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
print('--'*100)