My code is treating a list of dictionaries, like a string, typeerror : TypeError: string indices must be integers

Question

所以我使用 reddit api，出于某些与案例无关的原因，我想在这种情况下不使用 reddit 包装器。该代码实际上非常简单，它从 subreddit 中的特定 post 中提取评论和 1 级回复。

这是函数的代码，

def getcommentsforpost(subredditname,postid,):

    #here we make the request to reddit, and create a python dictionary   
    #from the resulting json code


    reditpath = '/r/' + subredditname + '/comments/' + postid
    redditusual = 'https://www.reddit.com'
    parameters = '.json?'
    totalpath = redditusual + reditpath + parameters
    p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
    result = p.json()

    #we are going to be looping a lot through dictionaries, to extract
    # the comments and their replies, thus, a list where we will insert  
    # them.
    totallist = [] 

    # the result object is a list with two dictionaries, one with info 
    #on the post, and the second one with all the info regarding the 
    #comments and their respective replies, because of this, we first 
    # process the posts info located in result[0]


    a = result[0]["data"]["children"][0]["data"]
    abody = a["selftext"]
    aauthor = a["author"]
    ascore = a["score"]
    adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                       "commentscore" : ascore}

    totallist.append(adictionary)




    # and now, we start processing the comments, located in result[1]

    for i in result[1]["data"]["children"]:

        ibody = i["data"]["body"]
        iauthor = i["data"]["author"]
        iscore = i["data"]["score"]



        idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                       "commentscore" : iscore}

        totallist.append(idictionary)

       # to clarify, until here, the code works perfectly. No problem 
       # whatsoever, its exactly in the following section where  the 
       #error happens. 

       # we create a new object, called replylist, 
        #that contains a  list of dictionaries in every interaction of 
        #the loop. 

        replylists =  i["data"]["replies"]["data"]["children"]

        # we are going to loop through them, in every comment we extract


        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]


            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                           "commentscore" : jscore } 
            totallist.append(jdictionary)

        # just like we did with the post info and the normal comments,
         # we extract and put it in totallist. 



        finaldf = pd.DataFrame(totallist)



    return(finaldf)

getcommentsforpost("Python","a7zss0")

但是在为回复执行该循环时，代码失败了。它 return 是这个错误“字符串索引必须是整数”，向变量 replylists 发出错误信号但是，当我像这样在循环外执行代码时

result[1]["data"]["children"][4]["data"]["replies"]["data"]["children"][0]

完美运行，应该是一样的效果。我相信它将回复列表视为一个字符串，而不是一个列表（这是它的class）

我尝试过的东西：

我尝试确保回复列表的 class 是一个带有 type() 函数的列表，它证明是 returning "list" 但只有 5 次交互循环，然后失败并出现相同的错误。

我尝试使用 for ja in range(0,len(replylists)) 制作列表循环，然后将 j 变量创建为 replylists[ja]。它返回了同样的错误。

我已经调试了两个小时，没有那段代码，该函数可以完美运行（当然，它不会 return 在最终数据帧中回复，但它可以运行）。为什么会这样？ replylists 是一个字典列表，而不是一个字符串，但它给出了那个奇怪的错误。

这是我们正在使用的函数的 reddit 文档： https://www.reddit.com/dev/api#GET_comments_{文章}

要导入的库：要求， pandas 作为 pd， json

我再说一遍，推荐 wrapper 不是解决方案，我想用 json 解决这个问题，然后休息。

正在处理： 'Python version 3.6.5 |Anaconda version 5.2.0,jupyter notebook 5.5.0 '

提前谢谢你。希望它变得有趣，我会继续从这里开始工作。

Answer 1

我已经进行了一些挖掘并将您的代码复制到本地环境并进行了一些调试，主要是：

try:
    replylists =  i["data"]["replies"]["data"]["children"]
except:
    for point in i['data']:
        print(point)
    exit()

通过这个，我看到实际上 i["data"] 有值（实际上有 57 个）并且 57 个中的一个包含 replies，但是我做了一些查看，我发现回复内容为空：

'replies': '' 是我直接打印出 i 破损值时看到的。

然而，所有的希望都没有落空：您只是忘记忽略回复内容为空的迭代 ('')，因为我还运行检查以查看有多少你的迭代实际上失败了，有些成功了，有些失败了（由于前面提到的推理）。

有了这个，我建议你在出现这样的错误时使用 try 和 except 进行调试（这是一项有用的技能），而且还有更多关于你的问题的主题，想好当回复内容为空时你想做什么

祝您一切顺利，希望对您有所帮助。

Answer 2

这是我解决它的方法，创建了一个 if 语句来检查 ["data"]["replies"] 是否是一个字典，在这种情况下执行代码，如果是则继续循环不是。

这是它的样子，再次感谢 Aditya 和 Goyo：

def getcommentsforpost(subredditname,postid,):
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()

totallist = []

# the result object is a list with two dictionaries, one with info on the post, and the second one
# with all the info regarding the comments and their respective replies 
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                   "commentscore" : ascore}

totallist.append(adictionary)


for i in result[1]["data"]["children"]:

    ibody = i["data"]["body"]
    iauthor = i["data"]["author"]
    iscore = i["data"]["score"]


    idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                   "commentscore" : iscore}

    totallist.append(idictionary)


    if isinstance(i["data"]["replies"],dict) :

        replylists =  i["data"]["replies"]["data"]["children"]

        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]
            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                       "commentscore" : jscore } 

            totallist.append(jdictionary)



    elif  type(i["data"]["replies"]) == 'str':
        continue



finaldf = pd.DataFrame(totallist)



return(finaldf)