有没有一种方法可以重新索引包含重复项的数据框中的行，以便重复项也可以重新编制索引？

Question

我目前正在做一个项目，我需要解析一个数据框，其中包含纽约尼克斯队在 2013-2014 赛季的所有投篮数据（约 7000 行）。第一列是 game_id，这是本赛季 82 场比赛中每场比赛的唯一标识符。前 72 行的 game_id 设置为 0021300008。接下来的 85 行用于下一场比赛，标识符为 0021300018，依此类推。我想重新索引所有这些行，以便第一个 game_id 为 1，下一个为 2，等等。我尝试查看 pandas 的重新索引选项，但我似乎找不到一个解法。有人有什么建议吗？

谢谢。

Answer 1

使用Series.rank.

df['game_id'] = df['game_id'].rank(method='dense').astype(int)

另一种选择是创建一个字典，将每个唯一的 'game_id' 映射到 1、2、3 等，并将其传递给 Series.map

# The game ids are mapped to 1, 2, 3, etc according to their order of appearance 
# to do a rank-based mapping, use enumerate(df['game_id'].unique().sort_values())
idx_map = {idx: n for n, idx in enumerate(df['game_id'].unique())}
df['game_id'] = df['game_id'].map(idx_map)

有没有一种方法可以重新索引包含重复项的数据框中的行，以便重复项也可以重新编制索引？

Is there a way I can reindex rows in a dataframe that contains duplicates, such that the duplicates would also get reindexed?

python

dataframe

pandas

reindex