简单的多维 numpy ndarray 到 pandas 数据框方法?

easy multidimensional numpy ndarray to pandas dataframe method?

拥有 4-D numpy.ndarray,例如

myarr = np.random.rand(10,4,3,2) dims={'time':1:10,'sub':1:4,'cond':['A','B','C'],'measure':['meas1','meas2']}

但可能有更高的维度。如何创建具有多索引的 pandas.dataframe,只需将维度作为索引传递,而无需进一步手动调整(将 ndarray 重塑为 2D 形状)?

我无法全神贯注于重塑,甚至还没有真正进入 3 dimensions,所以如果可能的话,我正在寻找一种 'automatic' 方法。

传递 column/row 索引并创建数据框的函数是什么?类似于:

df=nd2df(myarr,dim2row=[0,1],dim2col=[2,3],rowlab=['time','sub'],collab=['cond','measure'])

以及类似的东西:

              meas1             meas2
              A     B     C     A    B    C
sub   time
  1      1
         2
         3
         .
         .
  2      1
         2
 ...

如果它不是 possible/feasible 来自动执行,欢迎提供比 Multiindexing manual 更简洁的解释。

当我不关心维度的顺序时,我什至无法正确处理,例如我希望这会起作用:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])



pd.DataFrame(a.reshape(2*3*1,2*2),index)

给出:

ValueError: Shape of passed values is (4, 6), indices imply (4, 24)

我仍然不知道如何直接执行此操作,但这里有一个易于遵循的分步方法:

# Create 4D-array
a=np.arange(24).reshape((3,2,2,2))
# Set only one row index
rowiter=[[1,2,3]]
row_ind=pd.MultiIndex.from_product(rowiter, names=[u'time'])
# put the rest of dimenstion into columns
coliter=[[1,2],['m1','m2'],['A','B']]
col_ind=pd.MultiIndex.from_product(coliter, names=[u'sub',u'meas',u'cond'])
ncols=np.prod([len(coliter[x]) for x in range(len(coliter))])
b=pd.DataFrame(a.reshape(len(rowiter[0]),ncols),index=row_ind,columns=col_ind)
print(b)
# Reshape columns to rows as pleased:
b=b.stack('sub')
# switch levels and order in rows (level goes from inner to outer):
c=b.swaplevel(0,1,axis=0).sortlevel(0,axis=0)

检查维度分配是否正确:

print(a[:,0,0,0])
[ 0  8 16]
print(a[0,:,0,0])
[0 4]
print(a[0,0,:,0])
[0 2]

print(b)
meas      m1      m2    
cond       A   B   A   B
time sub                
1    1     0   1   2   3
     2     4   5   6   7
2    1     8   9  10  11
     2    12  13  14  15
3    1    16  17  18  19
     2    20  21  22  23

print(c)
meas      m1      m2    
cond       A   B   A   B
sub time                
1   1      0   1   2   3
    2      8   9  10  11
    3     16  17  18  19
2   1      4   5   6   7
    2     12  13  14  15
    3     20  21  22  23

您收到错误是因为您将 ndarray 重塑为 6x4 并应用了旨在捕获单个系列中所有维度的索引。以下是使 pet 示例正常工作的设置:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
index = pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])

pd.DataFrame(a.reshape(24, 1),index=index)

解决方案

这是一个通用的 DataFrame 创建器,应该可以完成工作:

def produce_df(rows, columns, row_names=None, column_names=None):
    """rows is a list of lists that will be used to build a MultiIndex
    columns is a list of lists that will be used to build a MultiIndex"""
    row_index = pd.MultiIndex.from_product(rows, names=row_names)
    col_index = pd.MultiIndex.from_product(columns, names=column_names)
    return pd.DataFrame(index=row_index, columns=col_index)

示范[​​=22=]

没有命名索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']])

       1         2     
       3    4    3    4
a c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN
b c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN

具有命名索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']],
           row_names=['alpha1', 'alpha2'], column_names=['number1', 'number2'])

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN
b      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN

根据您的数据结构,

names=['sub','time','measure','cond']  #ind1,ind2,col1,col2
labels=[[1,2,3],[1,2],['meas1','meas2'],list('ABC')]

实现目标的直接方法:

index = pd.MultiIndex.from_product(labels,names=names)
data=arange(index.size) # or myarr.flatten()

df=pd.DataFrame(data,index=index)
df22=df.reset_index().pivot_table(values=0,index=names[:2],columns=names[2:])


"""
measure  meas1         meas2        
cond         A   B   C     A   B   C
sub time                            
1   1        0   1   2     3   4   5
    2        6   7   8     9  10  11
2   1       12  13  14    15  16  17
    2       18  19  20    21  22  23
3   1       24  25  26    27  28  29
    2       30  31  32    33  34  35

"""