连接 statsmodels.OLS 的 DataFrame 列
Concatenating DataFrame columns for statsmodels.OLS
如果我想根据 Y 和 X2 的对数构建模型,我会这样做:
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
model = smf.ols(formula='np.log(Y) ~ X1 + np.log(X2)', data=df).fit()
如何对 statsmodels.api
做同样的事情?我知道我可以连接 df 但肯定有更简单的方法。
import statsmodels.api as sm
import numpy as np
import pandas as pd
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
y = np.log(df['Y'])
x = pd.DataFrame()
x['X1'] = d['X1']
x['logX2'] = np.log(d['X2'])
#x = df[['X1', np.log('X2')]] # I'd like to type sth like this
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
model.summary()
在 x = df...
(注释行)我得到:
TypeError: Not implemented for this type
您可以使用 pd.DataFrame
:
构建 x
x = pd.DataFrame({'X1': df['X1'], 'log(X2)': np.log(df['X2'])})
而不是
x = pd.DataFrame()
x['X1'] = d['X1']
x['logX2'] = np.log(d['X2'])
import numpy as np
import pandas as pd
import statsmodels.api as sm
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
y = np.log(df['Y'])
x = pd.DataFrame({'X1': df['X1'], 'log(X2)': np.log(df['X2'])})
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
print(model.summary())
如果我想根据 Y 和 X2 的对数构建模型,我会这样做:
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
model = smf.ols(formula='np.log(Y) ~ X1 + np.log(X2)', data=df).fit()
如何对 statsmodels.api
做同样的事情?我知道我可以连接 df 但肯定有更简单的方法。
import statsmodels.api as sm
import numpy as np
import pandas as pd
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
y = np.log(df['Y'])
x = pd.DataFrame()
x['X1'] = d['X1']
x['logX2'] = np.log(d['X2'])
#x = df[['X1', np.log('X2')]] # I'd like to type sth like this
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
model.summary()
在 x = df...
(注释行)我得到:
TypeError: Not implemented for this type
您可以使用 pd.DataFrame
:
x
x = pd.DataFrame({'X1': df['X1'], 'log(X2)': np.log(df['X2'])})
而不是
x = pd.DataFrame()
x['X1'] = d['X1']
x['logX2'] = np.log(d['X2'])
import numpy as np
import pandas as pd
import statsmodels.api as sm
d = {'Y': [1,2,3,4], 'X1': [5,6,7,8], 'X2': [9,10,11,12]}
df = pd.DataFrame(d)
y = np.log(df['Y'])
x = pd.DataFrame({'X1': df['X1'], 'log(X2)': np.log(df['X2'])})
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
print(model.summary())