Python While 循环问题 - Instagram API Returns 分页对象但不是新结果

Question

我正在尝试提取 Instagram post 的列表，这些列表已被标记为某个主题标签。我正在使用 RAPIDAPI found here。 Instagram 对返回的结果进行分页，所以我必须循环浏览页面才能获得所有结果。我遇到了一个非常奇怪的 bug/error，我收到了请求的下一页，但是 posts 来自上一页。

用一本书来类比，我可以看到书的第 1 页，我可以请求书给我看第 2 页。这本书给我看的是标有第 2 页的页面，但是页面与第 1 页相同。

使用RapidAPI网站提供的容器，我没有遇到这个错误。这让我相信问题一定是在我这边，大概是在我写的 while 循环 中。

如果有人可以查看我的 'while' 循环，或提出任何其他可以解决问题的建议，我将不胜感激。底部的索引范围错误列表很容易修复，所以我不关心它。

其他信息：此特定主题标签有 694 个结果，API returns 页面包含 50 个结果项。

import http.client
import json
import time


conn = http.client.HTTPSConnection("instagram-data1.p.rapidapi.com") #endpoint supplied by RAPIDAPI
##Begin Credential Section
headers = {
    'x-rapidapi-key': "*removed*",
    'x-rapidapi-host': "instagram-data1.p.rapidapi.com"
    }
##End Credential Section
hashtag = 'givingtuesdayaus'

conn.request("GET", "/hashtag/feed?hashtag=" + hashtag, headers=headers)

res = conn.getresponse()
data = res.read()
print(data.decode("utf-8")) #Purely for debugging, can be disabled
json_dictionary = json.loads(data.decode("utf-8")) #Saving returned results into JSON format, because I find it easier to work with
i = 1 # Results need to cycle through pages, using 'i' to track the number of loops and for input in the name of the file which is saved
with open(hashtag + str(i) + '.json', 'w') as json_file:
    json.dump(json_dictionary['collector'], json_file)

#JSON_dictionary contains five fields, 'count' which is number of results for hashtag query, 'has_more' boolean indicating if there are additional pages
# 'end_cursor' string which can be added to the url to cycle to the next page, 'collector' list containing post information, and 'len'

#while loop essentially checks if the 'has_more' indicates there are additional pages, if true uses the 'end_cursor' value to cycle to the next page
while json_dictionary['has_more']:
    time.sleep(1)
    cursor = json_dictionary['end_cursor']
    conn.request("GET", "/hashtag/feed?hashtag=" + hashtag +'&end-cursor=' + cursor, headers=headers)
    res = conn.getresponse()
    data = res.read()
    json_dictionary = json.loads(data.decode("utf-8"))
    i += 1
    print(i)
    print(json_dictionary['collector'][1]['id'])
    print(cursor) #these three prints rows are only used for debugging.
    with open(hashtag + str(i) + '.json', 'w') as json_file:
        json.dump(json_dictionary['collector'], json_file)

来自python控制台的结果：（如您所见，光标和'i'前进，但postid保持不变。保存的 JSON 个文件也都包含相同的 posts.

> {"count":694,"has_more":true,"end_cursor":"QVFCd2pVdEN2d01rNkw3UmRKSGVUN1EyanBlYzBPMS15MkIyUG1VdHhjWlJWMDBwRmVhaEYxd0czSE0wMktFcGhfMnItak5ZOE1GTzJvd05FU0pTMWxmVg==","collector":[{"id":"2467140087692742224","shortcode":"CI9CtaaDU5Q","type":"GraphImage",.....}
> #shortened by poster 2 2464906276234990574 QVFCd2pVdEN2d01rNkw3UmRKSGVUN1EyanBlYzBPMS15MkIyUG1VdHhjWlJWMDBwRmVhaEYxd0czSE0wMktFcGhfMnItak5ZOE1GTzJvd05FU0pTMWxmVg==
> 3 2464906276234990574
> QVFDVUlROFVKVVB3SEwyR05MSzJHZ2V1UXZqSzlzTVFhWDNBM3hXNENMcThKWExwWU90RFRnRm1FNWtSRGtrbTdORFIwRlU2QWZaSVByOHZhSXFnQnJsVg==
> 4 2464906276234990574
> QVFEVFpheV9SeFZCcWlKYkc3NUZZdG00Rk5KMWJsQVBNakJlZDcyMGlTWm9rUTlIQzRoYjVtTU1uRmhJZG5TTFBSOXdhbHozVUViUjZEbVpLdjVUQlJtVQ==
> Traceback (most recent call last):   File "<input>", line 33, in
> <module> IndexError: list index out of range

Answer 1

我看你把一个列表的列表弄坏了，只是你把一个列表的列表比列表的列表多了

示例：

数据 = [1,2,3,4,5]

您必须提供号码列表

数据[4]

不是这样的

数据[6]

你犯了一个错误

IndexError: 列表索引超出范围

这里可能有误

print(json_dictionary['collector'][1]['id'])
print(cursor) #these three prints rows are only used for debugging.
with open(hashtag + str(i) + '.json', 'w') as json_file:
    json.dump(json_dictionary['collector'], json_file)

Answer 2

向所有读到这里的人道歉，我是个白痴。

我在发布后不久就发现了错误：

conn.request("GET", "/hashtag/feed?hashtag=" + hashtag +'&end-cursor=' + cursor, headers=headers)

'end-cursor' 应该是 'end_cursor'.

Python While 循环问题 - Instagram API Returns 分页对象但不是新结果

Python While Loop Problem - Instagram API Returns Pagination Objects but not new results

python

while-loop

instagram