尝试从 df['column_name'].str.split(' ')[index] 存储索引时在 Pandas 中引发索引错误
Trying To Store an Index From df['column_name'].str.split(' ')[index] is Throwing an Index Error in Pandas
我正在使用来自 kaggle 的关于 NBA allstars (https://www.kaggle.com/fmejia21/nba-all-star-game-20002016) 的数据集 [link 供任何想 运行 自己使用的人使用]。数据集如下所示:
In [3]: df1.head(3)
Out[3]:
Year Player Pos ... Selection Type NBA Draft Status Nationality
0 2016 Stephen Curry G ... Western All-Star Fan Vote Selection 2009 Rnd 1 Pick 7 United States
1 2016 James Harden SG ... Western All-Star Fan Vote Selection 2009 Rnd 1 Pick 3 United States
2 2016 Kevin Durant SF ... Western All-Star Fan Vote Selection 2007 Rnd 1 Pick 2 United States
[3 rows x 9 columns]
我想做的是抓取 'NBA Draft Status' 列下的草稿位置并将其存储在另一列中,所以我首先检查拆分:
In [4]: df1['NBA Draft Status'].str.split(' ')
Out[4]:
0 [2009, Rnd, 1, Pick, 7]
1 [2009, Rnd, 1, Pick, 3]
所以看起来很简单;只需抓住第四个位置的项目。如果是第二轮选秀权,则在该数字上加 30。我用这个:
In [5]: positions = []
...: for draft in df1['NBA Draft Status']:
...: if 'Rnd 2' in draft:
...: position = draft.split(' ')[4]
...: position = int(position) + 30
...: positions.append(position)
...: else:
...: position = draft.split(' ')[4]
...: position = int(position)
...: positions.append(position)
并抛出索引错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-0946ed392ea2> in <module>
6 positions.append(position)
7 else:
----> 8 position = draft.split(' ')[4]
9 position = int(position)
10 positions.append(position)
IndexError: list index out of range
好的...现在问题来了;为什么超出范围?在尝试调查问题所在时,我发现我可以打印此索引,但无论出于何种原因都无法将其附加到空列表中。这有效:
In [6]: for draft in df1['NBA Draft Status']:
...: print(draft.split(' ')[4])
...: break
...:
7
谁能给我解释一下这是怎么回事?我知道这很罗嗦,但我不知道在不给数据集一些背景的情况下如何表达这个问题。
问题是你在 df1['NBA Draft Status']
中有一些值,其中只有 3 个空格,所以当你对它们调用 .split()
时,结果列表有 4 个项目,索引为 0导致你的索引错误。
df1['length'] = df1['NBA Draft Status'].apply(lambda draft: len(draft.split()))
df2 = df1.loc[df1.length == 4,:]
df2['NBA Draft Status']
Out[74]:
309 1996 NBA Draft, Undrafted
334 1996 NBA Draft, Undrafted
346 1998 NBA Draft, Undrafted
348 1996 NBA Draft, Undrafted
360 1996 NBA Draft, Undrafted
371 1998 NBA Draft, Undrafted
Name: NBA Draft Status, dtype: object
使用 df1 = df1.loc[df1.length == 5,:]
删除它们,然后重新运行您的代码。它会起作用。
我正在使用来自 kaggle 的关于 NBA allstars (https://www.kaggle.com/fmejia21/nba-all-star-game-20002016) 的数据集 [link 供任何想 运行 自己使用的人使用]。数据集如下所示:
In [3]: df1.head(3)
Out[3]:
Year Player Pos ... Selection Type NBA Draft Status Nationality
0 2016 Stephen Curry G ... Western All-Star Fan Vote Selection 2009 Rnd 1 Pick 7 United States
1 2016 James Harden SG ... Western All-Star Fan Vote Selection 2009 Rnd 1 Pick 3 United States
2 2016 Kevin Durant SF ... Western All-Star Fan Vote Selection 2007 Rnd 1 Pick 2 United States
[3 rows x 9 columns]
我想做的是抓取 'NBA Draft Status' 列下的草稿位置并将其存储在另一列中,所以我首先检查拆分:
In [4]: df1['NBA Draft Status'].str.split(' ')
Out[4]:
0 [2009, Rnd, 1, Pick, 7]
1 [2009, Rnd, 1, Pick, 3]
所以看起来很简单;只需抓住第四个位置的项目。如果是第二轮选秀权,则在该数字上加 30。我用这个:
In [5]: positions = []
...: for draft in df1['NBA Draft Status']:
...: if 'Rnd 2' in draft:
...: position = draft.split(' ')[4]
...: position = int(position) + 30
...: positions.append(position)
...: else:
...: position = draft.split(' ')[4]
...: position = int(position)
...: positions.append(position)
并抛出索引错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-0946ed392ea2> in <module>
6 positions.append(position)
7 else:
----> 8 position = draft.split(' ')[4]
9 position = int(position)
10 positions.append(position)
IndexError: list index out of range
好的...现在问题来了;为什么超出范围?在尝试调查问题所在时,我发现我可以打印此索引,但无论出于何种原因都无法将其附加到空列表中。这有效:
In [6]: for draft in df1['NBA Draft Status']:
...: print(draft.split(' ')[4])
...: break
...:
7
谁能给我解释一下这是怎么回事?我知道这很罗嗦,但我不知道在不给数据集一些背景的情况下如何表达这个问题。
问题是你在 df1['NBA Draft Status']
中有一些值,其中只有 3 个空格,所以当你对它们调用 .split()
时,结果列表有 4 个项目,索引为 0导致你的索引错误。
df1['length'] = df1['NBA Draft Status'].apply(lambda draft: len(draft.split()))
df2 = df1.loc[df1.length == 4,:]
df2['NBA Draft Status']
Out[74]:
309 1996 NBA Draft, Undrafted
334 1996 NBA Draft, Undrafted
346 1998 NBA Draft, Undrafted
348 1996 NBA Draft, Undrafted
360 1996 NBA Draft, Undrafted
371 1998 NBA Draft, Undrafted
Name: NBA Draft Status, dtype: object
使用 df1 = df1.loc[df1.length == 5,:]
删除它们,然后重新运行您的代码。它会起作用。