使用 PSAW 收集 Reddit 数据时出错

Error When Collecting Reddit Data With PSAW

我正在尝试使用 PSAW 库收集最新的 Reddit 评论:

from psaw import PushshiftAPI
api = PushshiftAPI()
my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
data=pd.DataFrame(k.d_ for k in my_reddit_comments)

我不断收到以下错误:

ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

有什么想法吗?

原来是关于Pushshift的查询限制。我写这个是为了克服这个问题:

while True:
  try:
    my_reddit_comments=api.search_comments(filter=['id','author', 'body', 'subreddit'],limit=100000)
    data=pd.DataFrame(k.d_ for k in my_reddit_comments)
    break
  except:
    print('Vahid is speaking: Max Retries reached. Sleeping for 1 minute',flush=True)
    time.sleep(60)