获取一列索引列表，显示 Pandas 数据框中哪些行相等

Question

我有一个 Pandas 数据框：

np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 5, size=(10, 1)), columns=['col0'])

即

我想得到一列，在每一行中指示与给定行具有相同值的所有行的索引。我愿意：

df = df.assign(sameas = df.col0.apply(lambda val: [i for i, e in enumerate(df.col0) if e==val]))

我得到：

   col0        sameas
0     3           [0]
1     4  [1, 3, 4, 9]
2     2  [2, 6, 7, 8]
3     4  [1, 3, 4, 9]
4     4  [1, 3, 4, 9]
5     1           [5]
6     2  [2, 6, 7, 8]
7     2  [2, 6, 7, 8]
8     2  [2, 6, 7, 8]
9     4  [1, 3, 4, 9]

这是预期的结果。在我的实际应用程序中，df 更大，并且此方法未在要求的时间内完成。

我认为运行时与行数的平方成比例，这很糟糕。我怎样才能更快地完成上述计算？

Answer 1

你可以用 transform

做 groupby

df['new'] = df.reset_index().groupby('col0')['index'].transform(lambda x : [x.tolist()]*len(x)).values
Out[146]: 
0             [0]
1    [1, 3, 4, 9]
2    [2, 6, 7, 8]
3    [1, 3, 4, 9]
4    [1, 3, 4, 9]
5             [5]
6    [2, 6, 7, 8]
7    [2, 6, 7, 8]
8    [2, 6, 7, 8]
9    [1, 3, 4, 9]
Name: index, dtype: object

Answer 2

你可以尝试groupby col0 并将分组索引转换为list

df['sameas'] = df['col0'].map(df.reset_index().groupby('col0')['index'].apply(list))

print(df)

   col0        sameas
0     3           [0]
1     4  [1, 3, 4, 9]
2     2  [2, 6, 7, 8]
3     4  [1, 3, 4, 9]
4     4  [1, 3, 4, 9]
5     1           [5]
6     2  [2, 6, 7, 8]
7     2  [2, 6, 7, 8]
8     2  [2, 6, 7, 8]
9     4  [1, 3, 4, 9]

Answer 3

尝试：

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 5, size=(10, 1)), columns=['col0'])
df
'''
   col0
0     3
1     4
2     2
3     4
4     4
5     1
6     2
7     2
8     2
9     4
'''

获取系列作为映射：

ser = df.groupby('col0').apply(lambda x: x.index.to_list())
ser
col0
1             [5]
2    [2, 6, 7, 8]
3             [0]
4    [1, 3, 4, 9]
dtype: object

使用映射：

df.assign(col1=df.col0.map(ser))
'''
   col0          col1
0     3           [0]
1     4  [1, 3, 4, 9]
2     2  [2, 6, 7, 8]
3     4  [1, 3, 4, 9]
4     4  [1, 3, 4, 9]
5     1           [5]
6     2  [2, 6, 7, 8]
7     2  [2, 6, 7, 8]
8     2  [2, 6, 7, 8]
9     4  [1, 3, 4, 9]
'''

Answer 4

On-liner方法：

df['col1'] = [df[df.col0.values == i].index.tolist()for i in df.col0.values]
df

输出：

index	col0	col1
0	3	0
1	4	1,3,4,9
2	2	2,6,7,8
3	4	1,3,4,9
4	4	1,3,4,9
5	1	5
6	2	2,6,7,8
7	2	2,6,7,8
8	2	2,6,7,8
9	4	1,3,4,9

获取一列索引列表，显示 Pandas 数据框中哪些行相等

Get a column of list of indices showing which rows are equal in a Pandas Dataframe

python

pandas

dataframe