如何组合 pandas df 以便可以合并具有排列的 col1 和 col2 值的行，仅包含一个组合并对计数列求和

Question

我想将排列后的 col1 和 col2 值合并为一行。仅包含第一个组合并对两者的计数列求和。在 pandas 中有没有简单的方法来做到这一点？

示例数据框和输出：例如，在下面的数据框中，我想合并具有值 A、B 和 B、A 的行，并对它们的计数列求和。对于具有值 C、D 和 D、C 的行也是如此，并对它们的计数值求和。我想按原样保留数据框中的其余行。

输入：

col1	col2	count
A	B	3
C	D	2
B	A	5
E	F	2
G	H	8
D	C	5
I	J	4

输出：

col1	col2	count
A	B	8
C	D	7
E	F	2
G	H	8
I	J	4

Answer 1

可以.groupby按照col1/col2:

排序

x = (
    df.groupby(df[["col1", "col2"]].apply(lambda x: tuple(sorted(x)), 1))
    .agg({"col1": "first", "col2": "first", "count": "sum"})
    .reset_index(drop=True)
)
print(x)

打印：

  col1 col2  count
0    A    B      8
1    C    D      7
2    E    F      2
3    G    H      8
4    I    J      4

Answer 2

我们可以 np.sort across rows to ensure the same values appear in the correct columns (for example A B and B A both become A B), then groupby sum 现在排序的列：

# Sort Across Rows
df[['col1', 'col2']] = np.sort(df[['col1', 'col2']], axis=1)
# Accumulate counts by col1 and col2 (now in same columns)
df = df.groupby(['col1', 'col2'], as_index=False)['count'].sum()

df:

  col1 col2  count
0    A    B      8
1    C    D      7
2    E    F      2
3    G    H      8
4    I    J      4

设置（DataFrame 和导入）：

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': ['A', 'C', 'B', 'E', 'G', 'D', 'I'],
    'col2': ['B', 'D', 'A', 'F', 'H', 'C', 'J'],
    'count': [3, 2, 5, 2, 8, 5, 4]
})

如何组合 pandas df 以便可以合并具有排列的 col1 和 col2 值的行，仅包含一个组合并对计数列求和

How to combine pandas df so that rows with permuted col1 and col2 values can be merged containing only one combination & summing a count column

python

dataframe

pandas

permute