Python 中的数据 Transforming/formatting

Question

我有以下熊猫数据：

df = {'ID_1': [1,1,1,2,2,3,4,4,4,4],
      'ID_2': ['a', 'b', 'c', 'f', 'g', 'd', 'v', 'x', 'y', 'z']
     }
df = pd.DataFrame(df)
display(df)

ID_1    ID_2
1   a
1   b
1   c
2   f
2   g
3   d
4   v
4   x
4   y
4   z

对于每个 ID_1，我需要找到 ID_2 的组合（顺序无关紧要）。例如，

当ID_1 = 1时，组合为ab, ac, bc。当ID_1 = 2时，组合为fg.

注意，如果ID_1的出现频率<2，那么这里就没有组合了（比如看ID_1=3）。

最后，我需要将组合结果存储在df2中，如下：

Answer 1

使用itertools.combinations的一种方式：

from itertools import combinations

def comb_df(ser):
    return pd.DataFrame(list(combinations(ser, 2)), columns=["from", "to"])

new_df = df.groupby("ID_1")["ID_2"].apply(comb_df).reset_index(drop=True)

输出：

  from to
0    a  b
1    a  c
2    b  c
3    f  g
4    v  x
5    v  y
6    v  z
7    x  y
8    x  z
9    y  z

Python 中的数据 Transforming/formatting

Data Transforming/formatting in Python

python

format

pandas