Twitter API 与 Twython 的连接中止

Twitter API connection aborted with Twython

我正在尝试从帐户列表中下载 Twitter 关注者。我的函数(使用 twython)对于较短的帐户列表效果很好,但对于较长的列表会出现错误。这不是 RateLimit 问题,因为如果达到速率限制,我的函数会一直休眠到下一次 bin。 错误是这个

twythonerror: ('Connection aborted.', 错误(10054, ''))

其他人似乎也有同样的问题,建议的解决方案是让函数在不同的 REST API 调用之间休眠,所以我实现了以下代码

    del twapi
    sleep(nap[afternoon])
    afternoon = afternoon + 1
    twapi = Twython(app_key=app_key, app_secret=app_secret,
                oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)

nap 是以秒为单位的间隔列表,afternoon 是一个索引。 尽管有这个建议,我仍然有完全相同的问题。看来睡眠并不能解决问题。 谁能帮帮我?

这是整个小说

def download_follower(serie_lst):
    """Creates account named txt files containing followers ids. Uses for loop on accounts names list."""
    nap = [1, 2, 4, 8, 16, 32, 64, 128]    
    afternoon = 0

    for exemplar in serie_lst:

        #username from serie_lst entries
        account_name = exemplar

        twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)

        try:
            #initializations
            del twapi
            if afternoon >= 7:
                afternoon =0

            sleep(nap[afternoon])
            afternoon = afternoon + 1
            twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
            next_cursor = -1
            result = {}
            result["screen_name"] = ""
            result["followers"] = []
            iteration = 0
            file_name = ""

            #user info
            user = twapi.lookup_user(screen_name = account_name)

            #store user name
            result['screen_name'] = account_name

            #loop until all cursored results are stored
            while (next_cursor != 0):
                sleep(random.randrange(start = 1, stop = 15, step = 1))
                call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
                #loop over each entry of followers id and append each     entry to results_follower    
                for i in call_result["ids"]:
                    result["followers"].append(i)
                next_cursor = call_result["next_cursor"] #new next_cursor
                iteration = iteration + 1
                if (iteration > 13): #skip sleep if all cursored pages are processed
                    error_msg = localtime()
                    error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
                    error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
                    print(error_msg)
                    del error_msg
                    sleep(901) #15min + 1sec
                    iteration = 0

            #output file
            file_name = "".join([account_name, ".txt"])

            #print output
            out_file = open(file_name, "w") #open file "account_name.txt"
            #out_file.write(str(result["followers"])) #standard format
            for i in result["followers"]: #R friendly table format
                out_file.write(str(i))
                out_file.write("\n")
            out_file.close()

        except twython.TwythonRateLimitError:
            #wait
            error_msg = localtime()
            error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
            error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
            print(error_msg)
            del error_msg
            del twapi
            sleep(901) #15min + 1sec

            #initializations
            if afternoon >= 7:
                afternoon =0

            sleep(nap[afternoon])
            afternoon = afternoon + 1
            twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
            next_cursor = -1
            result = {}
            result["screen_name"] = ""
            result["followers"] = []
            iteration = 0
            file_name = ""

            #user info
            user = twapi.lookup_user(screen_name = account_name)

            #store user name
            result['screen_name'] = account_name

            #loop until all cursored results are stored
            while (next_cursor != 0):
                sleep(random.randrange(start = 1, stop = 15, step = 1))
                call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
                #loop over each entry of followers id and append each entry to results_follower    
                for i in call_result["ids"]:
                    result["followers"].append(i)
                next_cursor = call_result["next_cursor"] #new next_cursor
                iteration = iteration + 1
                if (iteration > 13): #skip sleep if all cursored pages are processed
                    error_msg = localtime()
                    error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
                    error_msg = "".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
                    print(error_msg)
                    del error_msg
                    sleep(901) #15min + 1sec
                    iteration = 0

            #output file
            file_name = "".join([account_name, ".txt"])

            #print output
            out_file = open(file_name, "w") #open file "account_name.txt"
            #out_file.write(str(result["followers"])) #standard format
            for i in result["followers"]: #R friendly table format
                out_file.write(str(i))
                out_file.write("\n")
            out_file.close()

正如评论中所讨论的,您的代码目前存在一些问题。您不需要删除连接即可使其正常运行,我认为问题的出现是因为您第二次初始化而没有任何捕获达到您的速率限制。这是一个使用 Tweepy 的示例,说明如何获取所需信息:

import tweepy
from datetime import datetime


def download_followers(user, api):
    all_followers = []
    try:
        for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
            all_followers.extend(map(str, page))
        return all_followers
    except tweepy.TweepError:
        print('Could not access user {}. Skipping...'.format(user))

# Include your keys below:
consumer_key = 'YOUR_KEY'
consumer_secret = 'YOUR_KEY'
access_token = 'YOUR_KEY'
access_token_secret = 'YOUR_KEY'

# Set up tweepy API, with handling of rate limits
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# List of usernames to get followers for
lookup_users = ['asongtoruin', 'mbiella']

for username in lookup_users:
    user_followers = download_followers(username, main_api)
    if user_followers:
        with open(username + '.txt', 'w') as outfile:
            outfile.write('\n'.join(user_followers))
        print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))

Tweepy 足够聪明,可以在我们使用 wait_on_rate_limit=True 时知道它何时达到速率限制,并检查它需要休眠多长时间才能再次启动。通过使用 wait_on_rate_limit_notify=True,我们允许它粘贴出在下一次获得关注者页面之前要等待多长时间(通过这种基于 ID 的方法,好像有 5000 IDs per page)。

我们还捕获了一个 TweepError 异常 - 如果提供的用户名与我们的经过身份验证的用户无权查看的受保护帐户相关,则会发生这种情况。在这种情况下,我们简单地跳过用户以允许下载其他信息,但打印出无法访问该用户的警告。

运行 这会为它可以访问的任何用户保存一个包含关注者 ID 的文本文件。对我来说,这会打印以下内容:

Rate limit reached. Sleeping for: 593
Finished outputting: asongtoruin at 2017/02/22 11:43:12
Could not access user mbiella. Skipping...

asongtoruin(又名我)的关注者 ID 保存为 asongtoruin.txt

可能存在一个问题,即我们的关注者页面是从最新开始的。如果在我们的调用之间添加新用户,这 可能 (尽管我对 API 的理解还不够明确)会导致我们的输出数据集出现问题,因为我们可能会错过这些用户并最终在我们的数据集中出现重复项。如果重复成为问题,您可以将 return all_followers 更改为 return set(all_followers)