权利 Reddit 网络爬虫错误 "object has no attribute"
Praw Reddit Web Crawler Error "object has no attribute"
我正在尝试使用 Reddit 网络爬虫从特定的 subreddits 中提取热门评论并将它们保存在 .csv 文件中。
这是我的代码:
import datetime
import praw
import pandas as pd
reddit = praw.Reddit(client_id='',
client_secret='',
password='',
user_agent='',
username='')
#Reddit Crawler
list_counter = 0
sr_list = ['politics', 'conservative', 'liberal', 'libertarian', 'donaldtrump', 'joebiden', 'democrats', 'republican']
data = []
comments = []
while list_counter < len(sr_list):
subreddit = reddit.subreddit(sr_list[list_counter]).top(time_filter='month', limit=1) #Change limit for number of threads, .new/.hot
for s in subreddit: # for every submission in the subreddit
# fetch top level comments
for c in s.comments:
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
comments.append([c.subreddit, c._submission, 'Comment', s.title, c.author, c_time, c.score, c.body])
#May not need c._submission
s_time = datetime.datetime.fromtimestamp(s.created_utc) #Convert format of threadRE time
data.append([s.subreddit, s.id,'Thread', s.title, s.author, s_time, s.score, s.selftext])
list_counter+= 1
#Export to CSV#
df = pd.DataFrame(data, columns=['Subreddit Name','Thread ID', 'Thread/Comment', 'Thread Title', 'Author',
'Timestamp','Score','Content'])
df1 = pd.DataFrame(comments, columns=['Subreddit Name','Thread ID','Thread/Comment', 'Thread Title', 'Author',
'Timestamp','Score','Content'])
result = pd.concat([df, df1])
result.to_csv('Raw Data.csv', index=False)
该代码在大多数情况下都可以正常工作,但如果帖子和评论数量较多,它会返回此错误消息:
Traceback (most recent call last):
File "/Users/robin/Documents/Python/code/Jonckr Reddit Web Crawler.py", line 23, in <module>
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
AttributeError: 'MoreComments' object has no attribute 'created_utc'
Process finished with exit code 1
我在编程方面几乎是个业余爱好者,所以我不知道如何解决这个问题。将不胜感激。
提前致谢。
在 Praw docs 中,它声明这些 MoreComments 对象是 加载更多评论 和 继续此线程 在 Reddit 上遇到的链接。
为了解决这个问题,他们提出以下建议:
from praw.models import MoreComments
for top_level_comment in submission.comments:
if isinstance(top_level_comment, MoreComments):
continue
在您的代码上下文中,尝试以下操作:
for c in s.comments:
if isinstance(c, MoreComments):
continue
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
...
我正在尝试使用 Reddit 网络爬虫从特定的 subreddits 中提取热门评论并将它们保存在 .csv 文件中。
这是我的代码:
import datetime
import praw
import pandas as pd
reddit = praw.Reddit(client_id='',
client_secret='',
password='',
user_agent='',
username='')
#Reddit Crawler
list_counter = 0
sr_list = ['politics', 'conservative', 'liberal', 'libertarian', 'donaldtrump', 'joebiden', 'democrats', 'republican']
data = []
comments = []
while list_counter < len(sr_list):
subreddit = reddit.subreddit(sr_list[list_counter]).top(time_filter='month', limit=1) #Change limit for number of threads, .new/.hot
for s in subreddit: # for every submission in the subreddit
# fetch top level comments
for c in s.comments:
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
comments.append([c.subreddit, c._submission, 'Comment', s.title, c.author, c_time, c.score, c.body])
#May not need c._submission
s_time = datetime.datetime.fromtimestamp(s.created_utc) #Convert format of threadRE time
data.append([s.subreddit, s.id,'Thread', s.title, s.author, s_time, s.score, s.selftext])
list_counter+= 1
#Export to CSV#
df = pd.DataFrame(data, columns=['Subreddit Name','Thread ID', 'Thread/Comment', 'Thread Title', 'Author',
'Timestamp','Score','Content'])
df1 = pd.DataFrame(comments, columns=['Subreddit Name','Thread ID','Thread/Comment', 'Thread Title', 'Author',
'Timestamp','Score','Content'])
result = pd.concat([df, df1])
result.to_csv('Raw Data.csv', index=False)
该代码在大多数情况下都可以正常工作,但如果帖子和评论数量较多,它会返回此错误消息:
Traceback (most recent call last):
File "/Users/robin/Documents/Python/code/Jonckr Reddit Web Crawler.py", line 23, in <module>
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
AttributeError: 'MoreComments' object has no attribute 'created_utc'
Process finished with exit code 1
我在编程方面几乎是个业余爱好者,所以我不知道如何解决这个问题。将不胜感激。
提前致谢。
在 Praw docs 中,它声明这些 MoreComments 对象是 加载更多评论 和 继续此线程 在 Reddit 上遇到的链接。
为了解决这个问题,他们提出以下建议:
from praw.models import MoreComments
for top_level_comment in submission.comments:
if isinstance(top_level_comment, MoreComments):
continue
在您的代码上下文中,尝试以下操作:
for c in s.comments:
if isinstance(c, MoreComments):
continue
c_time = datetime.datetime.fromtimestamp(c.created_utc) #Convert format of comment time (Y:M:D , H:M:S)
...