尝试从 df['column_name'].str.split(' ')[index] 存储索引时在 Pandas 中引发索引错误

Trying To Store an Index From df['column_name'].str.split(' ')[index] is Throwing an Index Error in Pandas

我正在使用来自 kaggle 的关于 NBA allstars (https://www.kaggle.com/fmejia21/nba-all-star-game-20002016) 的数据集 [link 供任何想 运行 自己使用的人使用]。数据集如下所示:

In [3]: df1.head(3)
Out[3]: 
   Year         Player Pos  ...                       Selection Type   NBA Draft Status    Nationality
0  2016  Stephen Curry   G  ...  Western All-Star Fan Vote Selection  2009 Rnd 1 Pick 7  United States
1  2016   James Harden  SG  ...  Western All-Star Fan Vote Selection  2009 Rnd 1 Pick 3  United States
2  2016   Kevin Durant  SF  ...  Western All-Star Fan Vote Selection  2007 Rnd 1 Pick 2  United States

[3 rows x 9 columns]

我想做的是抓取 'NBA Draft Status' 列下的草稿位置并将其存储在另一列中,所以我首先检查拆分:

In [4]: df1['NBA Draft Status'].str.split(' ')
Out[4]: 
0       [2009, Rnd, 1, Pick, 7]
1       [2009, Rnd, 1, Pick, 3]

所以看起来很简单;只需抓住第四个位置的项目。如果是第二轮选秀权,则在该数字上加 30。我用这个:

In [5]: positions = []
   ...: for draft in df1['NBA Draft Status']:
   ...:     if 'Rnd 2' in draft:
   ...:         position = draft.split(' ')[4]
   ...:         position = int(position) + 30
   ...:         positions.append(position)
   ...:     else:
   ...:         position = draft.split(' ')[4]
   ...:         position = int(position)
   ...:         positions.append(position)

并抛出索引错误:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-0946ed392ea2> in <module>
      6         positions.append(position)
      7     else:
----> 8         position = draft.split(' ')[4]
      9         position = int(position)
     10         positions.append(position)

IndexError: list index out of range

好的...现在问题来了;为什么超出范围?在尝试调查问题所在时,我发现我可以打印此索引,但无论出于何种原因都无法将其附加到空列表中。这有效:

In [6]: for draft in df1['NBA Draft Status']:
   ...:     print(draft.split(' ')[4])
   ...:     break
   ...: 
7

谁能给我解释一下这是怎么回事?我知道这很罗嗦,但我不知道在不给数据集一些背景的情况下如何表达这个问题。

问题是你在 df1['NBA Draft Status'] 中有一些值,其中只有 3 个空格,所以当你对它们调用 .split() 时,结果列表有 4 个项目,索引为 0导致你的索引错误。

df1['length'] = df1['NBA Draft Status'].apply(lambda draft: len(draft.split()))
df2 = df1.loc[df1.length == 4,:]
df2['NBA Draft Status']
Out[74]: 
309    1996 NBA Draft, Undrafted
334    1996 NBA Draft, Undrafted
346    1998 NBA Draft, Undrafted
348    1996 NBA Draft, Undrafted
360    1996 NBA Draft, Undrafted
371    1998 NBA Draft, Undrafted
Name: NBA Draft Status, dtype: object

使用 df1 = df1.loc[df1.length == 5,:] 删除它们,然后重新运行您的代码。它会起作用。