从四个 2d numpy 数组创建具有多级列索引的数据框

Creating dataframe with multi level column index from from four 2d numpy arrays

我有四个二维 numpy 数组:

import numpy as np
import pandas as pd    
x1 = np.array([[2, 4, 1],
       [2, 2, 1],
       [1, 3, 3],
       [2, 2, 1],
       [3, 3, 2]])
x2 = np.array([[1, 2, 2],
       [4, 1, 4],
       [1, 4, 4],
       [3, 3, 2],
       [2, 2, 4]])

x3 = np.array([[4, 3, 2],
       [4, 3, 2],
       [4, 3, 3],
       [1, 2, 2],
       [1, 4, 3]])      
x4 = np.array([[3, 1, 1],
       [3, 4, 3],
       [2, 2, 1],
       [2, 1, 1],
       [1, 2, 4]])

我想创建一个数据框如下:

level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])
df = pd.DataFrame(np.concatenate((x1,x1,x3,x4),axis=1), columns=header)
df.index.name = 'Time'
df

DataFrame 中的数据不是所需的格式。

我希望第一级列标签 (location1) 中的四列 (x1,x2,x3,x4) 应该通过从所有 numpy 数组中获取第一列来创建。接下来的四列 (x1,x2,x3,x4) 即。第二个一级列标签(location2)中的四列应该通过从所有四个 numpy 数组中获取第二列来创建,依此类推。第一级列标签的长度,即。 len(level_1_label) 将等于所有四个 2d numpy 数组中的列数。

想要DataFrame

一个选项是颠倒创建 MultiIndex 列的顺序(因为 level_1_label 对应于列而 level_2_label 对应于数组);然后 swaplevel + sort_index (以所需的顺序获取它)在构建 DataFrame 之后:

level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_2_label, level_1_label], names=['Variable','Location'])
df = pd.DataFrame(np.concatenate((x1,x2,x3,x4),axis=1), columns=header).swaplevel(axis=1).sort_index(level=0, axis=1)
df.index.name = 'Time'

输出:

Location    location1          location2          location3         
Variable    x1 x2 x3 x4        x1 x2 x3 x4        x1 x2 x3 x4
Time                                                          
0            2  1  4  3         4  2  3  1         1  2  2  1
1            2  4  4  3         2  1  3  4         1  4  2  3
2            1  1  4  2         3  4  3  2         3  4  3  1
3            2  3  1  2         2  3  2  1         1  2  2  1
4            3  2  1  1         3  2  4  2         2  4  3  4

一个选项是在创建数据框之前按 Fortran 顺序重塑数据:


# reusing your code
level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])

# np.vstack is just a convenience wrapper around np.concatenate, axis=1
outcome = np.reshape(np.vstack([x1,x2,x3,x4]), (len(x1), -1), order = 'F')
df = pd.DataFrame(outcome, columns = header)
df.index.name = 'Time'

df

Location location1          location2          location3
Variable        x1 x2 x3 x4        x1 x2 x3 x4        x1 x2 x3 x4
Time
0                2  1  4  3         4  2  3  1         1  2  2  1
1                2  4  4  3         2  1  3  4         1  4  2  3
2                1  1  4  2         3  4  3  2         3  4  3  1
3                2  3  1  2         2  3  2  1         1  2  2  1
4                3  2  1  1         3  2  4  2         2  4  3  4