反转 Pandas 多索引数据帧
Invert Pandas Multiindexed DataFrame
我正在尝试使用 Pandas 多索引数据帧做几件事。第一个是按日期反转矩阵。也就是说,对于每个日期(在我的索引中),我都想找到倒数。这是一个测试 DataFrame:
import os
import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
import sys
# Create fake df
np.random.seed( 666 )
dt_lst = pd.date_range( start='2017-01-01', end='2017-06-30' )[:100]
df = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=100),
'RAND2':100.0*np.random.random(size=100)})
df2 = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2']
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
del df,df2
现在,对于每个日期,我想反转矩阵。
dfinv = pd.DataFrame( np.linalg.pinv(dforig.values), index=dforig.index, columns=dforig.columns )
显然,这是错误的。我希望我能得到有关按日期执行此操作的有效策略的建议(拆分-应用-合并?)。或者我真的最好在这里做一个循环,我切出每个日期并重建倒置的 df 吗?
如有任何想法或指点,我们将不胜感激。
干杯!
您需要 apply
和 np.linalg.pinv
的自定义函数:
np.random.seed( 666 )
dt_lst = pd.date_range( start='2017-01-01', end='2017-06-30' )[:5]
df = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=5),
'RAND2':100.0*np.random.random(size=5)})
df2 = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2']
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
print (dforig)
RAND1 RAND2 RAND3
date river
2017-01-01 1 0.700437 1.270320 1.813610
2 2.801748 3.810959 1.360207
2017-01-02 1 0.844187 41.358770 48.992448
2 3.376747 124.076310 36.744336
2017-01-03 1 0.676514 4.881279 7.215338
2 2.706057 14.643838 5.411503
2017-01-04 1 0.727858 9.992856 13.729128
2 2.911432 29.978568 10.296846
2017-01-05 1 0.951458 50.806631 53.398713
2 3.805832 152.419892 40.049035
def f(x):
x = x.reset_index(drop=True, level=0)
x = pd.DataFrame(np.linalg.pinv(x.values), x.columns, x.index)
return (x)
df_inv = dforig.groupby(level=0).apply(f)
print (df_inv)
river 1 2
date
2017-01-01 RAND1 -0.201457 0.192762
RAND2 -0.101951 0.196343
RAND3 0.700602 -0.211973
2017-01-02 RAND1 -0.000446 0.000386
RAND2 -0.008046 0.010735
RAND3 0.027212 -0.009069
2017-01-03 RAND1 -0.020513 0.019960
RAND2 -0.064182 0.087055
RAND3 0.183937 -0.060765
2017-01-04 RAND1 -0.005731 0.005380
RAND2 -0.032754 0.043910
RAND3 0.096982 -0.032246
2017-01-05 RAND1 -0.000375 0.000302
RAND2 -0.006551 0.008740
RAND3 0.024966 -0.008321
我正在尝试使用 Pandas 多索引数据帧做几件事。第一个是按日期反转矩阵。也就是说,对于每个日期(在我的索引中),我都想找到倒数。这是一个测试 DataFrame:
import os
import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
import sys
# Create fake df
np.random.seed( 666 )
dt_lst = pd.date_range( start='2017-01-01', end='2017-06-30' )[:100]
df = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=100),
'RAND2':100.0*np.random.random(size=100)})
df2 = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2']
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
del df,df2
现在,对于每个日期,我想反转矩阵。
dfinv = pd.DataFrame( np.linalg.pinv(dforig.values), index=dforig.index, columns=dforig.columns )
显然,这是错误的。我希望我能得到有关按日期执行此操作的有效策略的建议(拆分-应用-合并?)。或者我真的最好在这里做一个循环,我切出每个日期并重建倒置的 df 吗?
如有任何想法或指点,我们将不胜感激。
干杯!
您需要 apply
和 np.linalg.pinv
的自定义函数:
np.random.seed( 666 )
dt_lst = pd.date_range( start='2017-01-01', end='2017-06-30' )[:5]
df = pd.DataFrame({'date':dt_lst,'river':1,'RAND1':np.random.random(size=5),
'RAND2':100.0*np.random.random(size=5)})
df2 = df.copy()
df2['river'] = 2
df2['RAND1'] = 4.0 * df2['RAND1']
df2['RAND2'] = 3.0 * df2['RAND2']
df = df.set_index(['date','river'])
df2 = df2.set_index(['date','river'])
dforig = df.append(df2).sort_index(level='date')
dforig['RAND3'] = dforig['RAND2'] / dforig['RAND1']
print (dforig)
RAND1 RAND2 RAND3
date river
2017-01-01 1 0.700437 1.270320 1.813610
2 2.801748 3.810959 1.360207
2017-01-02 1 0.844187 41.358770 48.992448
2 3.376747 124.076310 36.744336
2017-01-03 1 0.676514 4.881279 7.215338
2 2.706057 14.643838 5.411503
2017-01-04 1 0.727858 9.992856 13.729128
2 2.911432 29.978568 10.296846
2017-01-05 1 0.951458 50.806631 53.398713
2 3.805832 152.419892 40.049035
def f(x):
x = x.reset_index(drop=True, level=0)
x = pd.DataFrame(np.linalg.pinv(x.values), x.columns, x.index)
return (x)
df_inv = dforig.groupby(level=0).apply(f)
print (df_inv)
river 1 2
date
2017-01-01 RAND1 -0.201457 0.192762
RAND2 -0.101951 0.196343
RAND3 0.700602 -0.211973
2017-01-02 RAND1 -0.000446 0.000386
RAND2 -0.008046 0.010735
RAND3 0.027212 -0.009069
2017-01-03 RAND1 -0.020513 0.019960
RAND2 -0.064182 0.087055
RAND3 0.183937 -0.060765
2017-01-04 RAND1 -0.005731 0.005380
RAND2 -0.032754 0.043910
RAND3 0.096982 -0.032246
2017-01-05 RAND1 -0.000375 0.000302
RAND2 -0.006551 0.008740
RAND3 0.024966 -0.008321