反转 Pandas 多索引数据帧

Invert Pandas Multiindexed DataFrame

我正在尝试使用 Pandas 多索引数据帧做几件事。第一个是按日期反转矩阵。也就是说,对于每个日期(在我的索引中),我都想找到倒数。这是一个测试 DataFrame:

import os
import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
import sys

# Create fake df
np.random.seed( 666 )
dt_lst    = pd.date_range( start='2017-01-01', end='2017-06-30' )[:100]
df        = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=100),
                     'RAND2':100.0*np.random.random(size=100)})
df2       = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2'] 
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
del df,df2 

现在,对于每个日期,我想反转矩阵。

dfinv = pd.DataFrame( np.linalg.pinv(dforig.values), index=dforig.index, columns=dforig.columns ) 

显然,这是错误的。我希望我能得到有关按日期执行此操作的有效策略的建议(拆分-应用-合并?)。或者我真的最好在这里做一个循环,我切出每个日期并重建倒置的 df 吗?

如有任何想法或指点,我们将不胜感激。

干杯!

您需要 applynp.linalg.pinv 的自定义函数:

np.random.seed( 666 )
dt_lst    = pd.date_range( start='2017-01-01', end='2017-06-30' )[:5]
df        = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=5),
                     'RAND2':100.0*np.random.random(size=5)})
df2       = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2'] 
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
print (dforig)
                     RAND1       RAND2      RAND3
date       river                                 
2017-01-01 1      0.700437    1.270320   1.813610
           2      2.801748    3.810959   1.360207
2017-01-02 1      0.844187   41.358770  48.992448
           2      3.376747  124.076310  36.744336
2017-01-03 1      0.676514    4.881279   7.215338
           2      2.706057   14.643838   5.411503
2017-01-04 1      0.727858    9.992856  13.729128
           2      2.911432   29.978568  10.296846
2017-01-05 1      0.951458   50.806631  53.398713
           2      3.805832  152.419892  40.049035

def f(x):
    x = x.reset_index(drop=True, level=0)
    x = pd.DataFrame(np.linalg.pinv(x.values), x.columns, x.index)
    return (x)

df_inv = dforig.groupby(level=0).apply(f)
print (df_inv)
river                    1         2
date                                
2017-01-01 RAND1 -0.201457  0.192762
           RAND2 -0.101951  0.196343
           RAND3  0.700602 -0.211973
2017-01-02 RAND1 -0.000446  0.000386
           RAND2 -0.008046  0.010735
           RAND3  0.027212 -0.009069
2017-01-03 RAND1 -0.020513  0.019960
           RAND2 -0.064182  0.087055
           RAND3  0.183937 -0.060765
2017-01-04 RAND1 -0.005731  0.005380
           RAND2 -0.032754  0.043910
           RAND3  0.096982 -0.032246
2017-01-05 RAND1 -0.000375  0.000302
           RAND2 -0.006551  0.008740
           RAND3  0.024966 -0.008321