列出超出嵌套 for 循环范围的索引

list index out of range for nested for loop

这是我的df

我在下面创建了一个函数来根据评论的词性标签获取三元组。

def get_trigram(pos_1, pos_2, pos_3):
    all_trigram = []

    for j in range(len(df)):

        trigram = []

        for i in range(len(df['pos'][j]['pos'])):

            if [value for value in df['pos'][j]['pos']][i-2] == pos_1 and [value for value in df['pos'][j]['pos']][i-1] == pos_2 and [value for value in df['pos'][j]['pos']][i] == pos_3:
                trigram.append([value for value in df['pos'][j]['word']][i-2] + " " + [value for value in df['pos'][j]['word']][i-1] + " " + [value for value in df['pos'][j]['word']][i])

        all_trigram.append(trigram)
      
    return all_trigram

运行函数没有错误,但是当我调用我的函数时

tri_adv_adj_noun = get_trigram('ADV', 'ADJ', 'NOUN')

报错:IndexError: list index out of range

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-149-12b4d4ffff3d> in <module>()
----> 1 tri_adv_adj_noun = get_trigram('ADV', 'ADJ', 'NOUN')
      2 tri_noun_adv_adj = get_trigram('NOUN', 'ADV', 'ADJ')
      3 
      4 trigram = tri_adv_adj_noun + tri_noun_adv_adj

<ipython-input-148-60ed39e749d0> in get_trigram(pos_1, pos_2, pos_3)
      8         for i in range(len(df_long['pos'][j]['pos'])):
      9 
---> 10             if [value for value in df_long['pos'][j]['pos']][i-2] == pos_1 and [value for value in df_long['pos'][j]['pos']][i-1] == pos_2 and [value for value in df_long['pos'][j]['pos']][i] == pos_3:
     11                 trigram.append([value for value in df_long['pos'][j]['word']][i-2] + " " + [value for value in df_long['pos'][j]['word']][i-1] + " " + [value for value in df_long['pos'][j]['word']][i])
     12 

IndexError: list index out of range

仅供参考,

df['pos'][0] returns 2 个列表的字典

我假设你的问题出在

部分
[value for value in df_long['pos'][j]['pos']][i-2]

首先,可能是 'pos' 列中的某些 'pos' 字典数据丢失了,在这种情况下,您应该设置一个条件,首先验证字典是否填充有数据。否则,当访问一个元素少于你正在搜索的索引值的列表时,你会得到那个错误(例如,i-2 将从列表的末尾返回 2 个位置,而当它没有找不到足够的元素返回,它会抛出“列表索引超出范围”错误) 例如:

if len(df['pos'][j]['pos']) >= 3:
   for i in range(len(df['pos'][j]['pos']):
      ...

其次,像这样编写代码是多余的,因为您要使用列表中的数据创建列表。你可以写:

 if df_long['pos'][j]['pos'][i-2] == pos_1 and df_long['pos'][j]['pos'][i-1] == pos_2  etc..

或者通过添加具有描述性名称的变量来进一步提高它的可见性:

for j in range(len(df)):

    trigram = []
    pos_list = df['pos'][j]['pos']

    if len(post_list) >= 3:
       for i in range(len(pos_list)):
          if pos_list[i-2] == pos_1 and pos_list[i-1] == pos_2 ...

希望对您有所帮助!