如何将 pandas MultiIndex 数据帧的值映射到另一个具有不同形状的 MultiIndex 数据帧?
How to map values of pandas MultiIndex dataframe to an other MultiIndex dataframe having a different shape?
我有以下两个不同形状的 MultiIndex 数据框:
Pandas 数据框 'a'
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
输出:
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 3 0 2 1 2 0 3 5 4 3
1 2 1 2 1 0 5 5 5 3 4
2 5 2 1 2 5 1 5 5 0 2
和数据框'b'
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
输出:
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 2 1 1 2
1 0 0 2 2
2 0 0 1 0
现在我想把两者结合起来。保留 pandas 'a 的 MultiIndex,以及 pandas'b'.
的值
pandas'c' 的期望输出:
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2
0 2 1 2 1 1 2 1 2 1 2
1 0 0 0 0 2 2 2 2 2 2
2 0 0 0 0 1 0 1 0 1 0
pandas'c'的Level_0与pandas'b'的level_0重合。 Level_1 in 'c' 与 pandas 'b'..
的列交替
您可能需要以某种方式组合以下各项:
temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)
任何事情都会有所帮助!
想法是匹配级别 a
和 b
并重复用于 DataFrame.reindex
的第二级列:
np.random.seed(123)
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
print (a)
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 5 2 4 2 1 3 2 3 1 1
1 0 1 1 0 0 1 3 5 4 0
2 0 4 1 3 2 4 2 4 0 5
print (b)
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 0 1 0 0
1 0 2 1 1
2 2 2 2 1
#
def repeat_to_length(s, wanted):
return (s * (wanted//len(s) + 1))[:wanted]
out = []
for lvl in a.columns.levels[0]:
colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
lvl1 = repeat_to_length(colsb, len(colsa))
out.extend(list(zip([lvl] * len(lvl1), lvl1)))
print (out)
[('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'),
('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'),
('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]
mux = pd.MultiIndex.from_tuples(out)
print (mux)
MultiIndex([('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2')],
)
c = b.reindex(mux, axis=1)
print (c)
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
0 0 1 0 1 0 0 0 0 0 0
1 0 2 0 2 1 1 1 1 1 1
2 2 2 2 2 2 1 2 1 2 1
我有以下两个不同形状的 MultiIndex 数据框: Pandas 数据框 'a'
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
输出:
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 3 0 2 1 2 0 3 5 4 3
1 2 1 2 1 0 5 5 5 3 4
2 5 2 1 2 5 1 5 5 0 2
和数据框'b'
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
输出:
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 2 1 1 2
1 0 0 2 2
2 0 0 1 0
现在我想把两者结合起来。保留 pandas 'a 的 MultiIndex,以及 pandas'b'.
的值pandas'c' 的期望输出:
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2
0 2 1 2 1 1 2 1 2 1 2
1 0 0 0 0 2 2 2 2 2 2
2 0 0 0 0 1 0 1 0 1 0
pandas'c'的Level_0与pandas'b'的level_0重合。 Level_1 in 'c' 与 pandas 'b'..
的列交替您可能需要以某种方式组合以下各项:
temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)
任何事情都会有所帮助!
想法是匹配级别 a
和 b
并重复用于 DataFrame.reindex
的第二级列:
np.random.seed(123)
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
print (a)
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 5 2 4 2 1 3 2 3 1 1
1 0 1 1 0 0 1 3 5 4 0
2 0 4 1 3 2 4 2 4 0 5
print (b)
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 0 1 0 0
1 0 2 1 1
2 2 2 2 1
#
def repeat_to_length(s, wanted):
return (s * (wanted//len(s) + 1))[:wanted]
out = []
for lvl in a.columns.levels[0]:
colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
lvl1 = repeat_to_length(colsb, len(colsa))
out.extend(list(zip([lvl] * len(lvl1), lvl1)))
print (out)
[('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'),
('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'),
('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]
mux = pd.MultiIndex.from_tuples(out)
print (mux)
MultiIndex([('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2')],
)
c = b.reindex(mux, axis=1)
print (c)
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
0 0 1 0 1 0 0 0 0 0 0
1 0 2 0 2 1 1 1 1 1 1
2 2 2 2 2 2 1 2 1 2 1