R:使用 MatchIt 的倾向得分匹配。如何使用 replace = TRUE 查找匹配观察值的数量?
R: Propensity Score Matching using MatchIt. How to find the number of matched observations with replace = TRUE?
请考虑以下事项:
我正在将数据与 R 中的 MatchIt
包匹配。我的控件比处理的要少,并使用选项 replace = TRUE
。根据manual,权重告诉我们匹配控件的频率。
来自手册:
"For matching with replacement, use replace = TRUE
. After matching with replacement, the weights can be used to reflect the frequency
with which each control unit was matched."
但是,我不明白为什么权重可以有小数,以及这如何反映频率。
比如我在手册中的例子中添加了replace == TRUE
(见第18页):
library("dplyr")
library("MatchIt")
m.out1 <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde,
method = "nearest", distance = "logit", replace = T)
tail(match.data(m.out1), 15)
#> treat age educ black hispan married nodegree re74 re75 re78
#> PSID388 0 19 11 1 0 0 1 0 0 16485.520
#> PSID390 0 48 13 0 0 1 0 0 0 0.000
#> PSID392 0 17 10 1 0 0 1 0 0 0.000
#> PSID393 0 38 12 0 0 1 0 0 0 18756.780
#> PSID396 0 48 14 0 0 1 0 0 0 7236.427
#> PSID398 0 17 8 1 0 0 1 0 0 4520.366
#> PSID400 0 37 8 1 0 0 1 0 0 648.722
#> PSID401 0 17 10 1 0 0 1 0 0 1053.619
#> PSID407 0 23 12 0 0 0 0 0 0 3902.676
#> PSID409 0 17 10 0 0 0 1 0 0 14942.770
#> PSID411 0 18 10 1 0 0 1 0 0 5306.516
#> PSID413 0 17 10 0 0 1 1 0 0 3859.822
#> PSID419 0 51 4 1 0 0 1 0 0 0.000
#> PSID423 0 27 10 1 0 0 1 0 0 7543.794
#> PSID425 0 18 11 0 0 0 1 0 0 10150.500
#> distance weights
#> PSID388 0.4067545 0.6
#> PSID390 0.4042321 1.2
#> PSID392 0.3974677 0.6
#> PSID393 0.4016920 4.2
#> PSID396 0.4152715 0.6
#> PSID398 0.3758217 1.8
#> PSID400 0.3595084 0.6
#> PSID401 0.3974677 1.2
#> PSID407 0.4144044 1.8
#> PSID409 0.3974677 0.6
#> PSID411 0.3966277 1.2
#> PSID413 0.3974677 1.2
#> PSID419 0.3080590 0.6
#> PSID423 0.3890954 1.2
#> PSID425 0.4076015 1.2
对照"PSID393"权重为4.276。因此,我假设此控件匹配了 4 或 5 次(四舍五入后)。
然而,我们也可以查看 match.matrix
来逐一查看匹配的治疗和控制。过滤"PSID393",我们看到该控件实际上匹配了7次:
m.out1$match.matrix %>% data.frame() %>% filter(X1 == "PSID393")
#> X1
#> 1 PSID393
#> 2 PSID393
#> 3 PSID393
#> 4 PSID393
#> 5 PSID393
#> 6 PSID393
#> 7 PSID393
由 reprex package (v0.2.1)
于 2019-05-06 创建
如何正确解读这两个输出?
调整权重,使它们总和等于控制组中唯一匹配观察值的数量。使用您的示例数据,请注意权重之和等于观察值的数量,平均权重为 1。此外,最常用观察值的权重是最少使用值的七倍):
match.data(m.out1) %>%
group_by(treat) %>%
summarise(min.weight=min(weights),
max.weight=max(weights),
mean.weight=mean(weights),
sum.weights=sum(weights),
n=n(),
max.match.ratio=max.weight/min.weight)
treat min.weight max.weight mean.weight sum.weights n max.match.ratio
1 0 0.605 4.24 1 112 112 7
2 1 1 1 1 185 185 1
要查看权重的分布,我们可以这样做:
match.data(m.out1) %>%
group_by(treat, weights) %>%
tally %>%
group_by(treat) %>%
mutate(weight.ratio = weights/min(weights))
treat weights n weight.ratio
1 0 0.605 74 1
2 0 1.21 19 2
3 0 1.82 10 3
4 0 2.42 6 4
5 0 3.63 2 6
6 0 4.24 1 7
7 1 1 185 1
the MatchIt
vignette 末尾有一个常见问题解答。项目 5.3,"How Exactly are the Weights Created?" 指出“控制组权重按比例缩放以求和唯一匹配控制的数量
单位。
请考虑以下事项:
我正在将数据与 R 中的 MatchIt
包匹配。我的控件比处理的要少,并使用选项 replace = TRUE
。根据manual,权重告诉我们匹配控件的频率。
来自手册:
"For matching with replacement, use
replace = TRUE
. After matching with replacement, the weights can be used to reflect the frequency with which each control unit was matched."
但是,我不明白为什么权重可以有小数,以及这如何反映频率。
比如我在手册中的例子中添加了replace == TRUE
(见第18页):
library("dplyr")
library("MatchIt")
m.out1 <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde,
method = "nearest", distance = "logit", replace = T)
tail(match.data(m.out1), 15)
#> treat age educ black hispan married nodegree re74 re75 re78
#> PSID388 0 19 11 1 0 0 1 0 0 16485.520
#> PSID390 0 48 13 0 0 1 0 0 0 0.000
#> PSID392 0 17 10 1 0 0 1 0 0 0.000
#> PSID393 0 38 12 0 0 1 0 0 0 18756.780
#> PSID396 0 48 14 0 0 1 0 0 0 7236.427
#> PSID398 0 17 8 1 0 0 1 0 0 4520.366
#> PSID400 0 37 8 1 0 0 1 0 0 648.722
#> PSID401 0 17 10 1 0 0 1 0 0 1053.619
#> PSID407 0 23 12 0 0 0 0 0 0 3902.676
#> PSID409 0 17 10 0 0 0 1 0 0 14942.770
#> PSID411 0 18 10 1 0 0 1 0 0 5306.516
#> PSID413 0 17 10 0 0 1 1 0 0 3859.822
#> PSID419 0 51 4 1 0 0 1 0 0 0.000
#> PSID423 0 27 10 1 0 0 1 0 0 7543.794
#> PSID425 0 18 11 0 0 0 1 0 0 10150.500
#> distance weights
#> PSID388 0.4067545 0.6
#> PSID390 0.4042321 1.2
#> PSID392 0.3974677 0.6
#> PSID393 0.4016920 4.2
#> PSID396 0.4152715 0.6
#> PSID398 0.3758217 1.8
#> PSID400 0.3595084 0.6
#> PSID401 0.3974677 1.2
#> PSID407 0.4144044 1.8
#> PSID409 0.3974677 0.6
#> PSID411 0.3966277 1.2
#> PSID413 0.3974677 1.2
#> PSID419 0.3080590 0.6
#> PSID423 0.3890954 1.2
#> PSID425 0.4076015 1.2
对照"PSID393"权重为4.276。因此,我假设此控件匹配了 4 或 5 次(四舍五入后)。
然而,我们也可以查看 match.matrix
来逐一查看匹配的治疗和控制。过滤"PSID393",我们看到该控件实际上匹配了7次:
m.out1$match.matrix %>% data.frame() %>% filter(X1 == "PSID393")
#> X1
#> 1 PSID393
#> 2 PSID393
#> 3 PSID393
#> 4 PSID393
#> 5 PSID393
#> 6 PSID393
#> 7 PSID393
由 reprex package (v0.2.1)
于 2019-05-06 创建如何正确解读这两个输出?
调整权重,使它们总和等于控制组中唯一匹配观察值的数量。使用您的示例数据,请注意权重之和等于观察值的数量,平均权重为 1。此外,最常用观察值的权重是最少使用值的七倍):
match.data(m.out1) %>%
group_by(treat) %>%
summarise(min.weight=min(weights),
max.weight=max(weights),
mean.weight=mean(weights),
sum.weights=sum(weights),
n=n(),
max.match.ratio=max.weight/min.weight)
treat min.weight max.weight mean.weight sum.weights n max.match.ratio 1 0 0.605 4.24 1 112 112 7 2 1 1 1 1 185 185 1
要查看权重的分布,我们可以这样做:
match.data(m.out1) %>%
group_by(treat, weights) %>%
tally %>%
group_by(treat) %>%
mutate(weight.ratio = weights/min(weights))
treat weights n weight.ratio 1 0 0.605 74 1 2 0 1.21 19 2 3 0 1.82 10 3 4 0 2.42 6 4 5 0 3.63 2 6 6 0 4.24 1 7 7 1 1 185 1
the MatchIt
vignette 末尾有一个常见问题解答。项目 5.3,"How Exactly are the Weights Created?" 指出“控制组权重按比例缩放以求和唯一匹配控制的数量
单位。