使用 Python 的固定效应面板数据回归
Panel data regression with fixed effects using Python
我在 df
中存储了以下面板:
state
district
year
y
constant
x1
x2
time
0
01
01001
2009
12
1
0.956007
639673
1
1
01
01001
2010
20
1
0.972175
639673
2
2
01
01001
2011
22
1
0.988343
639673
3
3
01
01002
2009
0
1
0
33746
1
4
01
01002
2010
1
1
0.225071
33746
2
5
01
01002
2011
5
1
0.450142
33746
3
6
01
01003
2009
0
1
0
45196
1
7
01
01003
2010
5
1
0.427477
45196
2
8
01
01003
2011
9
1
0.854955
45196
3
y
是各区的抗议人数
constant
是一栏全是1
x1
是移动网络提供商覆盖的地区面积比例
x2
为各区人口数(注意时间固定)
如何在Python中运行以下模型?
这是我试过的
# Transform `x2` to match model
df['x2'] = df['x2'].multiply(df['time'], axis=0)
# District fixed effects
df['delta'] = pd.Categorical(df['district'])
# State-time fixed effects
df['eta'] = pd.Categorical(df['state'] + df['year'].astype(str))
# Set indexes
df.set_index(['district','year'])
from linearmodels.panel import PanelOLS
m = PanelOLS(dependent=df['y'], exog=df[['constant','x1','x2','delta','eta']])
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set rank_check=False.
我做错了什么?
我研究了 the documentation,结果发现解决方案非常简单。
设置索引并将固定效应列转换为 pandas.Categorical
类型后(参见上面的问题):
# Import model
from linearmodels.panel import PanelOLS
# Model
m = PanelOLS(dependent=df['y'],
exog=df[['constant','x1','x2']],
entity_effects=True,
time_effects=False,
other_effects=df['eta'])
m.fit(cov_type='clustered', cluster_entity=True)
也就是说,不要将您的固定效应列传递给exog
。
您应该将它们传递给 entity_effects
(布尔值)、time_effects
(布尔值)或 other_effects
(pandas.Categorical)。
我在 df
中存储了以下面板:
state | district | year | y | constant | x1 | x2 | time | |
---|---|---|---|---|---|---|---|---|
0 | 01 | 01001 | 2009 | 12 | 1 | 0.956007 | 639673 | 1 |
1 | 01 | 01001 | 2010 | 20 | 1 | 0.972175 | 639673 | 2 |
2 | 01 | 01001 | 2011 | 22 | 1 | 0.988343 | 639673 | 3 |
3 | 01 | 01002 | 2009 | 0 | 1 | 0 | 33746 | 1 |
4 | 01 | 01002 | 2010 | 1 | 1 | 0.225071 | 33746 | 2 |
5 | 01 | 01002 | 2011 | 5 | 1 | 0.450142 | 33746 | 3 |
6 | 01 | 01003 | 2009 | 0 | 1 | 0 | 45196 | 1 |
7 | 01 | 01003 | 2010 | 5 | 1 | 0.427477 | 45196 | 2 |
8 | 01 | 01003 | 2011 | 9 | 1 | 0.854955 | 45196 | 3 |
y
是各区的抗议人数constant
是一栏全是1x1
是移动网络提供商覆盖的地区面积比例x2
为各区人口数(注意时间固定)
如何在Python中运行以下模型?
这是我试过的
# Transform `x2` to match model
df['x2'] = df['x2'].multiply(df['time'], axis=0)
# District fixed effects
df['delta'] = pd.Categorical(df['district'])
# State-time fixed effects
df['eta'] = pd.Categorical(df['state'] + df['year'].astype(str))
# Set indexes
df.set_index(['district','year'])
from linearmodels.panel import PanelOLS
m = PanelOLS(dependent=df['y'], exog=df[['constant','x1','x2','delta','eta']])
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set rank_check=False.
我做错了什么?
我研究了 the documentation,结果发现解决方案非常简单。
设置索引并将固定效应列转换为 pandas.Categorical
类型后(参见上面的问题):
# Import model
from linearmodels.panel import PanelOLS
# Model
m = PanelOLS(dependent=df['y'],
exog=df[['constant','x1','x2']],
entity_effects=True,
time_effects=False,
other_effects=df['eta'])
m.fit(cov_type='clustered', cluster_entity=True)
也就是说,不要将您的固定效应列传递给exog
。
您应该将它们传递给 entity_effects
(布尔值)、time_effects
(布尔值)或 other_effects
(pandas.Categorical)。