OLS 回归与 groupby
OLS Regression with groupby
我想 运行 使用 pandas 和 groupby 进行 OLS 回归。
我正在尝试以下代码:
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result
但是这个returns:
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 0.84978328 0.72115778 0.53965104 0.52955655 0.73372541 0.64617074\n 0.60040938 0.7147218 0.65533535 0.57980322 0.57382068 0.56543435\n 0.70740831 0.9245337 0.54859569 0.6789395 0.7086157 0.3835853\n 0.54924104 0.80813778 0.83758118 0.22673391 0.26594087 0.63650468\n 0.89889911 0.38324657 0.30235986 0.62922678 0.55219822 0.55950705\n 0.71137557 0.53631811 0.70158798 0.87116361 0.93751381 0.91125518\n 0.80020908 0.75301262 0.82391046 0.77483673 0.63069573 0.44954455\n 0.83578862 0.56338649 0.64236039 0.93270243 0.93077291 0.83847668\n 0.8268959 0.85400317 0.74319769 0.94803537 0.97484929 0.45366017\n 0.80823694 0.82028051 0.63960395 0.63015722 0.73132888 0.55570184\n 0.83265402 0.75009687 0.58207032 0.92064804 0.91058008 0.86726397\n 0.89204098 0.95573514 0.75704367 0.80786363 0.87448548 0.7553715\n 0.88965962 0.82828493 0.82423891 0.81034742 0.90104876 0.78875473\n 0.97369268] not in index'
我的语法有什么不正确的地方吗?
在没有 groupby 的情况下执行此操作将是这样的:
result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])
并且工作正常。
我的数据框看起来像这样:
FID Image_Date MEAN Accum_Prcp Accum_HDD
1 19920506 2.0 500.0 1000.0
1 19930506 1.7 450.0 1050.0
2 19920506 2.7 456.0 992.0
2 19930506 1.9 376.0 800.0
尝试:
grps=df.groupby(['FID'])
for fid, grp in grps:
ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])
我想 运行 使用 pandas 和 groupby 进行 OLS 回归。
我正在尝试以下代码:
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result
但是这个returns:
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 0.84978328 0.72115778 0.53965104 0.52955655 0.73372541 0.64617074\n 0.60040938 0.7147218 0.65533535 0.57980322 0.57382068 0.56543435\n 0.70740831 0.9245337 0.54859569 0.6789395 0.7086157 0.3835853\n 0.54924104 0.80813778 0.83758118 0.22673391 0.26594087 0.63650468\n 0.89889911 0.38324657 0.30235986 0.62922678 0.55219822 0.55950705\n 0.71137557 0.53631811 0.70158798 0.87116361 0.93751381 0.91125518\n 0.80020908 0.75301262 0.82391046 0.77483673 0.63069573 0.44954455\n 0.83578862 0.56338649 0.64236039 0.93270243 0.93077291 0.83847668\n 0.8268959 0.85400317 0.74319769 0.94803537 0.97484929 0.45366017\n 0.80823694 0.82028051 0.63960395 0.63015722 0.73132888 0.55570184\n 0.83265402 0.75009687 0.58207032 0.92064804 0.91058008 0.86726397\n 0.89204098 0.95573514 0.75704367 0.80786363 0.87448548 0.7553715\n 0.88965962 0.82828493 0.82423891 0.81034742 0.90104876 0.78875473\n 0.97369268] not in index'
我的语法有什么不正确的地方吗?
在没有 groupby 的情况下执行此操作将是这样的:
result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])
并且工作正常。
我的数据框看起来像这样:
FID Image_Date MEAN Accum_Prcp Accum_HDD
1 19920506 2.0 500.0 1000.0
1 19930506 1.7 450.0 1050.0
2 19920506 2.7 456.0 992.0
2 19930506 1.9 376.0 800.0
尝试:
grps=df.groupby(['FID'])
for fid, grp in grps:
ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])