多指标面板数据
Panel data with multiindex
我想问一下如何使用面板数据,或者如何转换数据集以便在具有多索引时可以将其建模为面板数据?
library(tibble)
library(plm)
library(fastDummies)
dataset <- tribble(
~country, ~year, ~sex, ~age, ~suicides_no,
"Albania", 1987, "male", "15-24", 50,
"Albania", 1987, "male", "35-50", 20,
"Albania", 1987, "male", "50-", 11,
"Albania", 1987, "female", "15-24", 18,
"Albania", 1987, "female", "35-50", 2,
"Albania", 1987, "female", "50-", 1,
"Albania", 1988, "male", "15-24", 50,
"Albania", 1988, "male", "35-50", 2,
"Albania", 1988, "male", "50-", 11,
"Albania", 1988, "female", "15-24", 17,
"Albania", 1988, "female", "35-50", 20,
"Albania", 1988, "female", "50-", 10,
"Albania", 1989, "male", "15-24", 0,
"Albania", 1989, "male", "35-50", 2,
"Albania", 1989, "male", "50-", 1,
"Albania", 1989, "female", "15-24", 7,
"Albania", 1989, "female", "35-50", 2,
"Albania", 1989, "female", "50-", 1,
"Germany", 1987, "male", "15-24", 50,
"Germany", 1987, "male", "35-50", 2,
"Germany", 1987, "male", "50-", 11,
"Germany", 1987, "female", "15-24", 18,
"Germany", 1987, "female", "35-50", 20,
"Germany", 1987, "female", "50-", 1,
"Germany", 1988, "male", "15-24", 0,
"Germany", 1988, "male", "35-50", 2,
"Germany", 1988, "male", "50-", 110,
"Germany", 1988, "female", "15-24", 17,
"Germany", 1988, "female", "35-50", 20,
"Germany", 1988, "female", "50-", 10,
"Germany", 1989, "male", "15-24", 0,
"Germany", 1989, "male", "35-50", 20,
"Germany", 1989, "male", "50-", 1,
"Germany", 1989, "female", "15-24", 73,
"Germany", 1989, "female", "35-50", 2,
"Germany", 1989, "female", "50-", 11
)
dataset %>% tail
dataset2 <- dummy_cols(dataset, "age") %>% select(-age)
panel <- pdata.frame(dataset2, index = c("country", "year"))
我们在一年内对一个横截面单元进行了多次观察,因为年龄间隔,
我们如何转换此数据集以将其用作面板数据并使用随机或固定效应?
使用:
library(plm)
fixex = plm(suicides_no ~ factor(sex) + factor(age), index = c("country", "year"), data = dataset, model = "within")
不起作用,如何转换数据以便对其进行估计
plm()
函数需要 ID 和时间的唯一组合,如错误消息 duplicate couples (id-time)
所示。当你 运行:
library(dplyr)
dataset %>%
count(country, year)
然后你可以看到,每个国家和年份的组合都有六个观测值:
country year n
<chr> <dbl> <int>
1 Albania 1987 6
2 Albania 1988 6
3 Albania 1989 6
4 Germany 1987 6
5 Germany 1988 6
6 Germany 1989 6
为避免这种情况,您需要创建唯一的 ID。我假设它们可以根据国家、年龄和性别来创建。然后,你可以这样做:
library(broom)
dataset %>%
mutate(ID = group_indices(., !!!select(., -suicides_no, -year))) %>%
mutate_at(vars(sex, age), as.factor) %>%
do(tidy(plm(suicides_no ~ sex + age,
index = c("year", "ID"),
model = "within",
data = .)))
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 sexmale 5.17 7.82 0.661 0.514
2 age35-50 -15.5 9.57 -1.62 0.116
3 age50- -10.1 9.57 -1.05 0.301
我想问一下如何使用面板数据,或者如何转换数据集以便在具有多索引时可以将其建模为面板数据?
library(tibble)
library(plm)
library(fastDummies)
dataset <- tribble(
~country, ~year, ~sex, ~age, ~suicides_no,
"Albania", 1987, "male", "15-24", 50,
"Albania", 1987, "male", "35-50", 20,
"Albania", 1987, "male", "50-", 11,
"Albania", 1987, "female", "15-24", 18,
"Albania", 1987, "female", "35-50", 2,
"Albania", 1987, "female", "50-", 1,
"Albania", 1988, "male", "15-24", 50,
"Albania", 1988, "male", "35-50", 2,
"Albania", 1988, "male", "50-", 11,
"Albania", 1988, "female", "15-24", 17,
"Albania", 1988, "female", "35-50", 20,
"Albania", 1988, "female", "50-", 10,
"Albania", 1989, "male", "15-24", 0,
"Albania", 1989, "male", "35-50", 2,
"Albania", 1989, "male", "50-", 1,
"Albania", 1989, "female", "15-24", 7,
"Albania", 1989, "female", "35-50", 2,
"Albania", 1989, "female", "50-", 1,
"Germany", 1987, "male", "15-24", 50,
"Germany", 1987, "male", "35-50", 2,
"Germany", 1987, "male", "50-", 11,
"Germany", 1987, "female", "15-24", 18,
"Germany", 1987, "female", "35-50", 20,
"Germany", 1987, "female", "50-", 1,
"Germany", 1988, "male", "15-24", 0,
"Germany", 1988, "male", "35-50", 2,
"Germany", 1988, "male", "50-", 110,
"Germany", 1988, "female", "15-24", 17,
"Germany", 1988, "female", "35-50", 20,
"Germany", 1988, "female", "50-", 10,
"Germany", 1989, "male", "15-24", 0,
"Germany", 1989, "male", "35-50", 20,
"Germany", 1989, "male", "50-", 1,
"Germany", 1989, "female", "15-24", 73,
"Germany", 1989, "female", "35-50", 2,
"Germany", 1989, "female", "50-", 11
)
dataset %>% tail
dataset2 <- dummy_cols(dataset, "age") %>% select(-age)
panel <- pdata.frame(dataset2, index = c("country", "year"))
我们在一年内对一个横截面单元进行了多次观察,因为年龄间隔,
我们如何转换此数据集以将其用作面板数据并使用随机或固定效应?
使用:
library(plm)
fixex = plm(suicides_no ~ factor(sex) + factor(age), index = c("country", "year"), data = dataset, model = "within")
不起作用,如何转换数据以便对其进行估计
plm()
函数需要 ID 和时间的唯一组合,如错误消息 duplicate couples (id-time)
所示。当你 运行:
library(dplyr)
dataset %>%
count(country, year)
然后你可以看到,每个国家和年份的组合都有六个观测值:
country year n
<chr> <dbl> <int>
1 Albania 1987 6
2 Albania 1988 6
3 Albania 1989 6
4 Germany 1987 6
5 Germany 1988 6
6 Germany 1989 6
为避免这种情况,您需要创建唯一的 ID。我假设它们可以根据国家、年龄和性别来创建。然后,你可以这样做:
library(broom)
dataset %>%
mutate(ID = group_indices(., !!!select(., -suicides_no, -year))) %>%
mutate_at(vars(sex, age), as.factor) %>%
do(tidy(plm(suicides_no ~ sex + age,
index = c("year", "ID"),
model = "within",
data = .)))
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 sexmale 5.17 7.82 0.661 0.514
2 age35-50 -15.5 9.57 -1.62 0.116
3 age50- -10.1 9.57 -1.05 0.301