如何将 pandas MultiIndex 数据帧的值映射到另一个具有不同形状的 MultiIndex 数据帧?

How to map values of pandas MultiIndex dataframe to an other MultiIndex dataframe having a different shape?

我有以下两个不同形状的 MultiIndex 数据框: Pandas 数据框 'a'

col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])

输出:

    Set 1             Set 2                    
    paa paa jaa paa   jaa jaa jaa paa paa paa
0     3   0   2   1     2   0   3   5   4   3
1     2   1   2   1     0   5   5   5   3   4
2     5   2   1   2     5   1   5   5   0   2

和数据框'b'

col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])

输出:

   Set 1      Set 2     
   P1_1 P1_2  P2_1 P2_2
0     2    1     1    2
1     0    0     2    2
2     0    0     1    0

现在我想把两者结合起来。保留 pandas 'a 的 MultiIndex,以及 pandas'b'.

的值

pandas'c' 的期望输出:

      Set 1                   Set 2                    
      P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2
0     2     1     2     1     1     2     1     2     1     2
1     0     0     0     0     2     2     2     2     2     2
2     0     0     0     0     1     0     1     0     1     0
pandas'c'的

Level_0与pandas'b'的level_0重合。 Level_1 in 'c' 与 pandas 'b'..

的列交替

您可能需要以某种方式组合以下各项:

temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)

任何事情都会有所帮助!

想法是匹配级别 ab 并重复用于 DataFrame.reindex 的第二级列:

np.random.seed(123)
    
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])

col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])


print (a)
  Set 1             Set 2                    
    paa paa jaa paa   jaa jaa jaa paa paa paa
0     5   2   4   2     1   3   2   3   1   1
1     0   1   1   0     0   1   3   5   4   0
2     0   4   1   3     2   4   2   4   0   5

print (b)
  Set 1      Set 2     
   P1_1 P1_2  P2_1 P2_2
0     0    1     0    0
1     0    2     1    1
2     2    2     2    1

#
def repeat_to_length(s, wanted):
    return (s * (wanted//len(s) + 1))[:wanted]


out = []
for lvl in a.columns.levels[0]:
    colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
    colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
    lvl1 = repeat_to_length(colsb, len(colsa))
    out.extend(list(zip([lvl] * len(lvl1), lvl1)))

print (out)
[('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'), 
 ('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), 
 ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]

mux = pd.MultiIndex.from_tuples(out)
print (mux)
MultiIndex([('Set 1', 'P1_1'),
            ('Set 1', 'P1_2'),
            ('Set 1', 'P1_1'),
            ('Set 1', 'P1_2'),
            ('Set 2', 'P2_1'),
            ('Set 2', 'P2_2'),
            ('Set 2', 'P2_1'),
            ('Set 2', 'P2_2'),
            ('Set 2', 'P2_1'),
            ('Set 2', 'P2_2')],
           )

c = b.reindex(mux, axis=1)
print (c)
  Set 1                Set 2                         
   P1_1 P1_2 P1_1 P1_2  P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
0     0    1    0    1     0    0    0    0    0    0
1     0    2    0    2     1    1    1    1    1    1
2     2    2    2    2     2    1    2    1    2    1