如何使用 Reddit API 循环创建 DataFrame 并管理列表

How to create a DataFrame with Reddit API loop and manage the list

我对 Reddit API (PRAW/PSAW)、Python 以及一般编程非常陌生。我想做的是在 6 个月内从某些 subreddits 获得最高提交,然后将列表转换为 DataFrame 并稍后转换为 CSV 文件。

我想:

  1. 获取列表的长度
  2. 按日期(纪元)排序
  3. 用这个制作一个数据框

到目前为止我尝试了什么:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
        len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
        sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable. 
                                       # I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.

简而言之,

我想用它制作一个数据框也可以解决前两个问题,尽管我认为能够使用代码来做到这一点在评估列表时会有所帮助!

回答问题的 3 个部分:

  1. 要获取列表的长度,您需要将要评估的列表传递给 len() 方法,因此如果您想要计算 list_submission 的长度,您而是做 len(list_submission)。现在你基本上是在尝试获得虚无的长度,所以这就是你看到零的原因。
  2. 如果提交符合要求,您可以使用 list_submission.append(submission) 将其附加到提交列表中。然后for循环完成后,就可以使用sorted()对整个列表进行排序了。您需要传入整个列表以及要排序的键,因此它看起来像 sorted(list_submission, key=lambda submission: submission.created_utc)。您收到错误的原因是您传递了错误的参数。
  3. 您将列表转换为 DataFrame 的方法应该会起作用。您可以使用 columns = ['created_utc', 'title', 'score', 'id'] 设置列名。

最终代码如下所示:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id)
        list_submission.append(submission)
        print(len(list_submission))

sorted(list_submission, key=lambda submission: submission.created_utc)  
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])