向多 header 数据框中的每个 parent 列添加一列
Add a column to each parent column in multi header dataframe
所以我有一个函数,returns 一个格式类似于下面的 df 的数据框。
header = pd.MultiIndex.from_product([['location1','location2'],
['S1','S2','S3']],
names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
df
loc location1 location2
S S1 S2 S3 S1 S2 S3
a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113
b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206
c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472
d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113
e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
我想做的是在每个位置创建一个新列,像这样创建一个 table。
loc location1 location2
S S1 S2 S3 S4 S1 S2 S3 S4
a -1.245988 0.858071 -1.433669 -1.433669 0.105300 -0.630531 -0.148113 -1.433669
b 1.132016 0.318813 0.949564 0.949564 -0.349722 -0.904325 0.443206 0.949564
c -0.017991 0.032925 0.274248 0.274248 0.326454 -0.108982 0.567472 0.274248
d 2.363533 -1.676141 0.562893 0.562893 0.967338 -1.071719 -0.321113 0.562893
e 1.921324 0.110705 0.023244 0.023244 -0.432196 0.172972 -0.50368 0.023244
每次我尝试以与普通 DataFrame 类似的方式创建列时,它都会说索引不存在...
这可能吗?或者我是否需要用我需要的列重新创建数据框并填充数据?
感谢您的帮助。
如果需要用值 location1, S3
填充的新列,请使用:
df = df.stack(0)
df['S4'] = df.groupby(level=0)['S3'].transform('first')
df = df.unstack().swaplevel(1,0,axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.048870 0.883278 0.659195 0.659195 -1.172545 0.311774 -1.436998
b -0.151088 -0.994126 0.814953 0.814953 -0.097351 0.219566 -0.105680
c 0.164416 -1.241656 -0.950665 -0.950665 -0.413584 -1.626581 -0.116679
d -0.343600 1.652102 0.488441 0.488441 -0.972463 -0.583244 1.142386
e 1.744620 -0.841092 -0.420305 -0.420305 -1.524430 -0.201024 0.032714
loc
S S4
a 0.659195
b 0.814953
c -0.950665
d 0.488441
e -0.420305
另一种方法是按元组选择列,创建MultiIndex
并添加到原始:
mux = pd.MultiIndex.from_product([df.columns.levels[0], ['S4']], names=df.columns.names)
print (mux)
MultiIndex([('location1', 'S4'),
('location2', 'S4')],
names=['loc', 'S'])
df1 = df[('location1','S3')].to_frame('S4').reindex(mux, axis=1, level=1)
print (df1)
loc location1 location2
S S4 S4
a 1.308520 1.308520
b 0.451843 0.451843
c -2.162076 -2.162076
d 2.287326 2.287326
e 0.107393 0.107393
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.761594 -0.542844 1.308520 1.308520 -0.650264 1.139555 0.823617
b -0.419000 1.389820 0.451843 0.451843 -1.866177 -2.122505 -1.640534
c 1.040347 0.840024 -2.162076 -2.162076 0.538753 1.065711 0.745565
d -0.953894 0.173901 2.287326 2.287326 -1.564201 -0.081494 0.444838
e 0.906456 -0.832980 0.107393 0.107393 0.010980 0.376815 -0.367043
loc
S S4
a 1.308520
b 0.451843
c -2.162076
d 2.287326
e 0.107393
所以我有一个函数,returns 一个格式类似于下面的 df 的数据框。
header = pd.MultiIndex.from_product([['location1','location2'],
['S1','S2','S3']],
names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
df
loc location1 location2
S S1 S2 S3 S1 S2 S3
a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113
b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206
c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472
d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113
e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
我想做的是在每个位置创建一个新列,像这样创建一个 table。
loc location1 location2
S S1 S2 S3 S4 S1 S2 S3 S4
a -1.245988 0.858071 -1.433669 -1.433669 0.105300 -0.630531 -0.148113 -1.433669
b 1.132016 0.318813 0.949564 0.949564 -0.349722 -0.904325 0.443206 0.949564
c -0.017991 0.032925 0.274248 0.274248 0.326454 -0.108982 0.567472 0.274248
d 2.363533 -1.676141 0.562893 0.562893 0.967338 -1.071719 -0.321113 0.562893
e 1.921324 0.110705 0.023244 0.023244 -0.432196 0.172972 -0.50368 0.023244
每次我尝试以与普通 DataFrame 类似的方式创建列时,它都会说索引不存在...
这可能吗?或者我是否需要用我需要的列重新创建数据框并填充数据?
感谢您的帮助。
如果需要用值 location1, S3
填充的新列,请使用:
df = df.stack(0)
df['S4'] = df.groupby(level=0)['S3'].transform('first')
df = df.unstack().swaplevel(1,0,axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.048870 0.883278 0.659195 0.659195 -1.172545 0.311774 -1.436998
b -0.151088 -0.994126 0.814953 0.814953 -0.097351 0.219566 -0.105680
c 0.164416 -1.241656 -0.950665 -0.950665 -0.413584 -1.626581 -0.116679
d -0.343600 1.652102 0.488441 0.488441 -0.972463 -0.583244 1.142386
e 1.744620 -0.841092 -0.420305 -0.420305 -1.524430 -0.201024 0.032714
loc
S S4
a 0.659195
b 0.814953
c -0.950665
d 0.488441
e -0.420305
另一种方法是按元组选择列,创建MultiIndex
并添加到原始:
mux = pd.MultiIndex.from_product([df.columns.levels[0], ['S4']], names=df.columns.names)
print (mux)
MultiIndex([('location1', 'S4'),
('location2', 'S4')],
names=['loc', 'S'])
df1 = df[('location1','S3')].to_frame('S4').reindex(mux, axis=1, level=1)
print (df1)
loc location1 location2
S S4 S4
a 1.308520 1.308520
b 0.451843 0.451843
c -2.162076 -2.162076
d 2.287326 2.287326
e 0.107393 0.107393
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.761594 -0.542844 1.308520 1.308520 -0.650264 1.139555 0.823617
b -0.419000 1.389820 0.451843 0.451843 -1.866177 -2.122505 -1.640534
c 1.040347 0.840024 -2.162076 -2.162076 0.538753 1.065711 0.745565
d -0.953894 0.173901 2.287326 2.287326 -1.564201 -0.081494 0.444838
e 0.906456 -0.832980 0.107393 0.107393 0.010980 0.376815 -0.367043
loc
S S4
a 1.308520
b 0.451843
c -2.162076
d 2.287326
e 0.107393