以新的列名迭代合并 pandas 列

Question

假设我在一个循环中迭代合并一个熊猫数据框，但在两到三次迭代后，熊猫重复列名，例如考虑以下示例，我在其中迭代合并列，但为简单起见没有循环：

A= {'Name':['A','B','C'],'GPA':[4.0,3.80,3.70], 'School':['U','U','U'], 'Time':[22,26,30]}
A1 = pd.DataFrame(A)
B= {'Name':['D','E','F'],'GPA':[3.50,3.70,3.60], 'School':['S','S','S'],'Time':[34,44,54]}
B1 = pd.DataFrame(B)
C= {'Name':['G','H','I'],'GPA':[3.70,3.50,3.70], 'School':['C','C','C'],'Time':[76,86,96]}
C1 = pd.DataFrame(C)
L= [A1,B1,C1]
comb = A1
for ii in L[1:]:
    comb = pd.concat([comb,ii],ignore_index=True)
comb

B = pd.merge(comb, comb, on=['Name','GPA'])
C = pd.merge(B, comb, on=['Name','GPA'])
D = pd.merge(C, comb, on=['Name','GPA'])

你看熊猫把School_x和School_y的名字重复了两次，有没有办法改成School_x和School_y，School_z和School_t。我不是在谈论之后重命名它，而是强制合并为不相同的列选择新的列名。否则如何区分具有 1000 列的数据框并想象 500 个具有相同的列名。

更新：以上只是一个示例，假设您正在像这样循环合并多个数据帧：

  for ii in list:
      df  = df.merge(A,on = 'some column', how = 'outer')

那么在我看来，每次重复相同的列时，你如何迭代地更改列名，即使有后缀。

Answer 1

尝试将 suffixes 参数更改为 ('_z', '_t'):

的元组

B = pd.merge(comb, comb, on=['Name','GPA'])
C = pd.merge(B, comb, on=['Name','GPA'])
D = pd.merge(C, comb, on=['Name','GPA'], suffixes=('_z', '_t'))

>>> D
  Name  GPA School_x  Time_x School_y  Time_y School_z  Time_z School_t  Time_t
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96
>>>

如 pd.merge 文档中所述：

Parameters:
...
...

suffixes: list-like, default is (“_x”, “_y”)

A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

... ...

编辑：

对于问题的最新更新，请尝试创建一个迭代器并使用 next。

functools.reduce 更好：

from functools import reduce
from string import ascii_lowercase
it = iter(ascii_lowercase)
print(reduce(lambda x, y: pd.merge(x, y, on=['Name','GPA'], suffixes=('_' + next(it), '_' + next(it))), [comb for _ in range(4)]))

输出：

  Name  GPA School_a  Time_a School_b  Time_b School_e  Time_e School_f  Time_f
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96

如您所见，我使用 [comb for _ in range(4)] 创建了一个列表理解，它将循环并合并 4 次，要更改次数只需更改数字即 [comb for _ in range(10)].

对于函数：

from functools import reduce
from string import ascii_lowercase
def cumulative_merge(df, n):
    it = iter(ascii_lowercase)
    return reduce(lambda x, y: pd.merge(x, y, on=['Name','GPA'], suffixes=('_' + next(it), '_' + next(it))), [comb for _ in range(n)])

执行：

print(cumulative_merge(df, 4))

输出：

  Name  GPA School_a  Time_a School_b  Time_b School_e  Time_e School_f  Time_f
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96

以新的列名迭代合并 pandas 列

Iteratively merge panda columns with new column names

python

merge

loops

pandas

编辑：