如何使用 Reddit API 循环创建 DataFrame 并管理列表
How to create a DataFrame with Reddit API loop and manage the list
我对 Reddit API (PRAW/PSAW)、Python 以及一般编程非常陌生。我想做的是在 6 个月内从某些 subreddits 获得最高提交,然后将列表转换为 DataFrame 并稍后转换为 CSV 文件。
我想:
- 获取列表的长度
- 按日期(纪元)排序
- 用这个制作一个数据框
到目前为止我尝试了什么:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable.
# I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.
简而言之,
我想用它制作一个数据框也可以解决前两个问题,尽管我认为能够使用代码来做到这一点在评估列表时会有所帮助!
回答问题的 3 个部分:
- 要获取列表的长度,您需要将要评估的列表传递给
len()
方法,因此如果您想要计算 list_submission
的长度,您而是做 len(list_submission)
。现在你基本上是在尝试获得虚无的长度,所以这就是你看到零的原因。
- 如果提交符合要求,您可以使用
list_submission.append(submission)
将其附加到提交列表中。然后for循环完成后,就可以使用sorted()
对整个列表进行排序了。您需要传入整个列表以及要排序的键,因此它看起来像 sorted(list_submission, key=lambda submission: submission.created_utc)
。您收到错误的原因是您传递了错误的参数。
- 您将列表转换为 DataFrame 的方法应该会起作用。您可以使用
columns = ['created_utc', 'title', 'score', 'id']
设置列名。
最终代码如下所示:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id)
list_submission.append(submission)
print(len(list_submission))
sorted(list_submission, key=lambda submission: submission.created_utc)
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])
我对 Reddit API (PRAW/PSAW)、Python 以及一般编程非常陌生。我想做的是在 6 个月内从某些 subreddits 获得最高提交,然后将列表转换为 DataFrame 并稍后转换为 CSV 文件。
我想:
- 获取列表的长度
- 按日期(纪元)排序
- 用这个制作一个数据框
到目前为止我尝试了什么:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable.
# I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.
简而言之,
我想用它制作一个数据框也可以解决前两个问题,尽管我认为能够使用代码来做到这一点在评估列表时会有所帮助!
回答问题的 3 个部分:
- 要获取列表的长度,您需要将要评估的列表传递给
len()
方法,因此如果您想要计算list_submission
的长度,您而是做len(list_submission)
。现在你基本上是在尝试获得虚无的长度,所以这就是你看到零的原因。 - 如果提交符合要求,您可以使用
list_submission.append(submission)
将其附加到提交列表中。然后for循环完成后,就可以使用sorted()
对整个列表进行排序了。您需要传入整个列表以及要排序的键,因此它看起来像sorted(list_submission, key=lambda submission: submission.created_utc)
。您收到错误的原因是您传递了错误的参数。 - 您将列表转换为 DataFrame 的方法应该会起作用。您可以使用
columns = ['created_utc', 'title', 'score', 'id']
设置列名。
最终代码如下所示:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id)
list_submission.append(submission)
print(len(list_submission))
sorted(list_submission, key=lambda submission: submission.created_utc)
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])