Twitter 的 timeline() 长度 API

Question

我正在尝试获取来自特定用户的所有推文：

def get_all_tweets(user_id, DEBUG):
    # Your bearer token here
    t = Twarc2(bearer_token="blah")

    # Initialize a list to hold all the tweepy Tweets
    alltweets = []
    new_tweets = {}

    if DEBUG:
        # Debug: read from file
        f = open('tweets_debug.txt',)
        new_tweets = json.load(f)
        alltweets.extend(new_tweets)
    else:
        # make initial request for most recent tweets (3200 is the maximum allowed count)
        new_tweets = t.timeline(user=user_id)
        # save most recent tweets
        alltweets.extend(new_tweets)

    if DEBUG:
        # Debug: write to file
        f = open("tweets_debug.txt", "w")
        f.write(json.dumps(alltweets, indent=2, sort_keys=False))
        f.close()

    # Save the id of the oldest tweet less one
    oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)

    # Keep grabbing tweets until there are no tweets left to grab
    while len(dict(new_tweets)) > 0:
        print(f"getting tweets before {oldest}")
        
        # All subsiquent requests use the max_id param to prevent duplicates
        new_tweets = t.timeline(user=user_id,until_id=oldest)
        
        # Save most recent tweets
        alltweets.extend(new_tweets)
        
        # Update the id of the oldest tweet less one
        oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
        
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    res = []
    for tweetlist in alltweets:
        res.extend(tweetlist['data'])
    
    f = open("output.txt", "w")
    f.write(json.dumps(res, indent=2, sort_keys=False))
    f.close()
    
    return res

但是，len(dict(new_tweets)) 不起作用。它总是 returns 0. sum(1 for dummy in new_tweets) 也 returns 0.

我试过 json.load(new_tweets) 但还是不行。

但是，alltweets.extend(new_tweets) 工作正常。

似乎是 timeline() returns 生成器类型值 (<generator object Twarc2._timeline at 0x000001D78B3D8B30>)。有什么方法可以计算它的长度以确定是否还有未抓取的推文？

或者，有没有办法合并...

someList = []
someList.extend(new_tweets)
while len(someList) > 0:
    # blah blah

...与 while?

合为一行

编辑：我在 while 循环之前尝试了 print(list(new_tweets))，结果是 returns []。看起来对象实际上是 empty.

是否因为 alltweets.extend(new_tweets) 以某种方式消耗了 new_tweets 生成器...？

Answer 1

我自己想出来了。这个问题可以通过将生成器转换为列表来解决：

new_tweets = list(t.timeline(user=user_id,until_id=oldest))

Twitter 的 timeline() 长度 API

Length of timeline() of Twitter API

python

twitter

twitterapi-python

twarc2