使用 PSAW 收集 Reddit 数据时出错
Error When Collecting Reddit Data With PSAW
我正在尝试使用 PSAW 库收集最新的 Reddit 评论:
from psaw import PushshiftAPI
api = PushshiftAPI()
my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
data=pd.DataFrame(k.d_ for k in my_reddit_comments)
我不断收到以下错误:
ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes
read)', IncompleteRead(0 bytes read))
有什么想法吗?
原来是关于Pushshift的查询限制。我写这个是为了克服这个问题:
while True:
try:
my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
data=pd.DataFrame(k.d_ for k in my_reddit_comments)
break
except:
print('Vahid is speaking: Max Retries reached. Sleeping for 1 minute',flush=True)
time.sleep(60)
我正在尝试使用 PSAW 库收集最新的 Reddit 评论:
from psaw import PushshiftAPI
api = PushshiftAPI()
my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
data=pd.DataFrame(k.d_ for k in my_reddit_comments)
我不断收到以下错误:
ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
有什么想法吗?
原来是关于Pushshift的查询限制。我写这个是为了克服这个问题:
while True:
try:
my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
data=pd.DataFrame(k.d_ for k in my_reddit_comments)
break
except:
print('Vahid is speaking: Max Retries reached. Sleeping for 1 minute',flush=True)
time.sleep(60)