Twitter 的 timeline() 长度 API
Length of timeline() of Twitter API
我正在尝试获取来自特定用户的所有推文:
def get_all_tweets(user_id, DEBUG):
# Your bearer token here
t = Twarc2(bearer_token="blah")
# Initialize a list to hold all the tweepy Tweets
alltweets = []
new_tweets = {}
if DEBUG:
# Debug: read from file
f = open('tweets_debug.txt',)
new_tweets = json.load(f)
alltweets.extend(new_tweets)
else:
# make initial request for most recent tweets (3200 is the maximum allowed count)
new_tweets = t.timeline(user=user_id)
# save most recent tweets
alltweets.extend(new_tweets)
if DEBUG:
# Debug: write to file
f = open("tweets_debug.txt", "w")
f.write(json.dumps(alltweets, indent=2, sort_keys=False))
f.close()
# Save the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
# Keep grabbing tweets until there are no tweets left to grab
while len(dict(new_tweets)) > 0:
print(f"getting tweets before {oldest}")
# All subsiquent requests use the max_id param to prevent duplicates
new_tweets = t.timeline(user=user_id,until_id=oldest)
# Save most recent tweets
alltweets.extend(new_tweets)
# Update the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
print(f"...{len(alltweets)} tweets downloaded so far")
res = []
for tweetlist in alltweets:
res.extend(tweetlist['data'])
f = open("output.txt", "w")
f.write(json.dumps(res, indent=2, sort_keys=False))
f.close()
return res
但是,len(dict(new_tweets))
不起作用。它总是 returns 0. sum(1 for dummy in new_tweets)
也 returns 0.
我试过 json.load(new_tweets)
但还是不行。
但是,alltweets.extend(new_tweets)
工作正常。
似乎是 timeline()
returns 生成器类型值 (<generator object Twarc2._timeline at 0x000001D78B3D8B30>
)。有什么方法可以计算它的长度以确定是否还有未抓取的推文?
或者,有没有办法合并...
someList = []
someList.extend(new_tweets)
while len(someList) > 0:
# blah blah
...与 while
?
合为一行
编辑:我在 while 循环之前尝试了 print(list(new_tweets))
,结果是 returns []
。看起来对象实际上是 empty.
是否因为 alltweets.extend(new_tweets)
以某种方式消耗了 new_tweets 生成器...?
我自己想出来了。这个问题可以通过将生成器转换为列表来解决:
new_tweets = list(t.timeline(user=user_id,until_id=oldest))
我正在尝试获取来自特定用户的所有推文:
def get_all_tweets(user_id, DEBUG):
# Your bearer token here
t = Twarc2(bearer_token="blah")
# Initialize a list to hold all the tweepy Tweets
alltweets = []
new_tweets = {}
if DEBUG:
# Debug: read from file
f = open('tweets_debug.txt',)
new_tweets = json.load(f)
alltweets.extend(new_tweets)
else:
# make initial request for most recent tweets (3200 is the maximum allowed count)
new_tweets = t.timeline(user=user_id)
# save most recent tweets
alltweets.extend(new_tweets)
if DEBUG:
# Debug: write to file
f = open("tweets_debug.txt", "w")
f.write(json.dumps(alltweets, indent=2, sort_keys=False))
f.close()
# Save the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
# Keep grabbing tweets until there are no tweets left to grab
while len(dict(new_tweets)) > 0:
print(f"getting tweets before {oldest}")
# All subsiquent requests use the max_id param to prevent duplicates
new_tweets = t.timeline(user=user_id,until_id=oldest)
# Save most recent tweets
alltweets.extend(new_tweets)
# Update the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
print(f"...{len(alltweets)} tweets downloaded so far")
res = []
for tweetlist in alltweets:
res.extend(tweetlist['data'])
f = open("output.txt", "w")
f.write(json.dumps(res, indent=2, sort_keys=False))
f.close()
return res
但是,len(dict(new_tweets))
不起作用。它总是 returns 0. sum(1 for dummy in new_tweets)
也 returns 0.
我试过 json.load(new_tweets)
但还是不行。
但是,alltweets.extend(new_tweets)
工作正常。
似乎是 timeline()
returns 生成器类型值 (<generator object Twarc2._timeline at 0x000001D78B3D8B30>
)。有什么方法可以计算它的长度以确定是否还有未抓取的推文?
或者,有没有办法合并...
someList = []
someList.extend(new_tweets)
while len(someList) > 0:
# blah blah
...与 while
?
编辑:我在 while 循环之前尝试了 print(list(new_tweets))
,结果是 returns []
。看起来对象实际上是 empty.
是否因为 alltweets.extend(new_tweets)
以某种方式消耗了 new_tweets 生成器...?
我自己想出来了。这个问题可以通过将生成器转换为列表来解决:
new_tweets = list(t.timeline(user=user_id,until_id=oldest))