Pandas 从长到宽,带有分层列 headers

Pandas long to wide with hierarchical column headers

我上下查找但找不到与我正在尝试做的非常相似的示例。简而言之,我对具有多个变量的实验进行了多次试验。数据被收集并存储在一个长数据帧 object 中,就像这样(生成的数据示例):

输入:

import pandas as pd
import itertools
from scipy import rand

trials = ['trial_1' , 'trial_2']
ppi = ['5_ppi', '10_ppi']
rd = ['7%', '12%']
iter = [rd, ppi, trials]
df = pd.DataFrame()

for k, params in enumerate(list(itertools.product(*iter))):
    ext_data = [rand(), rand()]
    load_data = [rand(), rand()]
    df1 = pd.DataFrame({'ext': ext_data, 'load': load_data, 'trial': params[2], 'ppi': params[1], 'rd': params[0]})
    df = pd.concat([df, df1], axis=0)

print(df)

输出:

df
        ext      load    trial     ppi   rd
0  0.287997  0.874457  trial_1   5_ppi   7%
1  0.783776  0.878291  trial_1   5_ppi   7%
0  0.015054  0.886801  trial_2   5_ppi   7%
1  0.243617  0.234560  trial_2   5_ppi   7%
0  0.291621  0.519084  trial_1  10_ppi   7%
1  0.627786  0.072551  trial_1  10_ppi   7%
0  0.349199  0.235718  trial_2  10_ppi   7%
1  0.284535  0.328547  trial_2  10_ppi   7%
0  0.725747  0.688157  trial_1   5_ppi  12%
1  0.656839  0.297645  trial_1   5_ppi  12%
0  0.534276  0.794199  trial_2   5_ppi  12%
1  0.680596  0.381575  trial_2   5_ppi  12%
0  0.494404  0.246841  trial_1  10_ppi  12%
1  0.148489  0.549250  trial_1  10_ppi  12%
0  0.791440  0.372119  trial_2  10_ppi  12%
1  0.078047  0.552541  trial_2  10_ppi  12%

[16 rows x 5 columns]

(实际上每个试验有数百个数据点,而不是这里显示的两个)

我想将其转换为具有分层列的宽数据框,组织方式如下(我无法使用 markdown 使 values/headers 跨越多个列,所以请原谅截图)

我已经尝试了很多东西,但我已经失去了方向。我认为 pivot_table 让我最接近我想要的,但它最终会汇总值的平均值而不是列出它们,如下所示:

输入:

df.pivot_table(columns=['rd', 'ppi', 'trial'])

输出:

rd          12%                                      7%                             
ppi      10_ppi               5_ppi              10_ppi              5_ppi          
trial   trial_1   trial_2   trial_1   trial_2   trial_1  trial_2   trial_1   trial_2
ext    0.619217  0.661812  0.652555  0.167241  0.340024  0.42324  0.565166  0.436858
load   0.397430  0.102965  0.528162  0.550871  0.206560  0.28524  0.731204  0.303079

[2 rows x 8 columns]

也有可能每组数据有不同数量的数据点(即试验 1 可能包含 2 个数据点,而试验 2 可能有 3 个)。我束手无策。将这个长数据帧 object 转换为具有多个分层列的宽数据帧 object 的神奇命令是什么?

这可能不是最有效的答案,但我认为它正在实现您想要的结果并且应该适用于任何数量的观察。

def reformat(df, column_name):
  groups = df.groupby(['rd','ppi','trial'])[column_name].apply(list)
  temp_1 = groups.reset_index(name = 'listvalues')


  # Make names for each of your new columns
  col_names = [column_name + str(i) for i in np.arange(len(temp_1['listvalues'][0]))]
  # Split listvalues into df where every item is its own column
  listvalues = pd.DataFrame(temp_1["listvalues"].to_list(), columns=[col_names])

  # Merge listvalues with the temp_1 df and get rid of the extra listvalues column
  df = temp_1.join(listvalues)
  del df['listvalues']

  df = df.pivot_table(columns=['rd', 'ppi', 'trial'])
  return df

# Do this twice and append to get the desired dataframe
df2 = reformat(df, 'ext')
df3 = reformat(df, 'load')
finaldf = df2.append(df3)

finaldf 的样子:

试试这个:

idx_cols = [*df][-1:-4:-1]
res = df.set_index(idx_cols + [df.groupby(idx_cols).cumcount()])
res = res.stack().unstack([0,1,2,-1])
print(res)

我想出了一个可能是更“正确”的解决方案,但它需要我重新排列我的 原始 数据集,以便 .pivot() 方法将正常工作。这是我 re-cast 数据的方式:

输入:

import pandas as pd
import itertools
from scipy import rand

trials = ['trial_1' , 'trial_2']
ppi = ['5_ppi', '10_ppi']
rd = ['7%', '12%']
iter = [rd, ppi, trials]
df = pd.DataFrame()

for k, params in enumerate(list(itertools.product(*iter))):
    ext_data = [rand(), rand()]
    load_data = [rand(), rand()]
    df1 = pd.DataFrame({'data': ext_data, 'type': 'ext', 'trial': params[2], 'ppi': params[1], 'rd': params[0]})
    df2 = pd.concat([df1, pd.DataFrame({'data': load_data, 'type': 'load', 'trial': params[2], 'ppi': params[1], 'rd': params[0]})])
    df = pd.concat([df, df2], axis=0)

print(df)

输出:

df
       data  type    trial     ppi   rd
0  0.959315   ext  trial_1   5_ppi   7%
1  0.394340   ext  trial_1   5_ppi   7%
0  0.140045  load  trial_1   5_ppi   7%
1  0.519967  load  trial_1   5_ppi   7%
0  0.483302   ext  trial_2   5_ppi   7%
1  0.552380   ext  trial_2   5_ppi   7%
0  0.907199  load  trial_2   5_ppi   7%
1  0.123719  load  trial_2   5_ppi   7%
0  0.190914   ext  trial_1  10_ppi   7%
1  0.053163   ext  trial_1  10_ppi   7%
0  0.085914  load  trial_1  10_ppi   7%
1  0.749197  load  trial_1  10_ppi   7%
0  0.112615   ext  trial_2  10_ppi   7%
1  0.363111   ext  trial_2  10_ppi   7%
0  0.508180  load  trial_2  10_ppi   7%
1  0.459821  load  trial_2  10_ppi   7%
0  0.346808   ext  trial_1   5_ppi  12%
1  0.322950   ext  trial_1   5_ppi  12%
0  0.642119  load  trial_1   5_ppi  12%
1  0.101987  load  trial_1   5_ppi  12%
0  0.488866   ext  trial_2   5_ppi  12%
1  0.583071   ext  trial_2   5_ppi  12%
0  0.119333  load  trial_2   5_ppi  12%
1  0.800356  load  trial_2   5_ppi  12%
0  0.733883   ext  trial_1  10_ppi  12%
1  0.856037   ext  trial_1  10_ppi  12%
0  0.980035  load  trial_1  10_ppi  12%
1  0.364698  load  trial_1  10_ppi  12%
0  0.697155   ext  trial_2  10_ppi  12%
1  0.712375   ext  trial_2  10_ppi  12%
0  0.285191  load  trial_2  10_ppi  12%
1  0.198097  load  trial_2  10_ppi  12%

[32 rows x 5 columns]

然后,如上所示,将数据 完全 投射,我将 .pivot() 应用到它,如下所示:

输入: final_df = df.pivot(columns=['rd', 'ppi', 'trial', 'type'])

输出:

final_df
           data                                                                                                                                                      
rd           7%                                                                             12%                                                                      
ppi       5_ppi                                  10_ppi                                   5_ppi                                  10_ppi                              
trial   trial_1             trial_2             trial_1             trial_2             trial_1             trial_2             trial_1             trial_2          
type        ext      load       ext      load       ext      load       ext      load       ext      load       ext      load       ext      load       ext      load
0      0.953832  0.012920  0.929764  0.069459  0.406110  0.866707  0.372693  0.815767  0.632988  0.310581  0.027626  0.416959  0.742982  0.340738  0.287946  0.294494
1      0.355673  0.663347  0.363117  0.860274  0.619436  0.146213  0.525354  0.038739  0.579613  0.488108  0.734074  0.794760  0.399273  0.517228  0.736619  0.860785

[2 rows x 16 columns]

转瞬即逝!