R MatchIt:使用比率和卡尺参数似乎强制 replace=T
R MatchIt: Using ratio and caliper arguments seems to force replace=T
我正在使用 R 中的 MatchIt 包来执行倾向得分匹配。匹配方法是最近邻,距离度量是glm。当我将 ratio 和 caliper 设置为默认值时,或者如果我只为其中一个输入非默认值,我会得到我期望的权重分布而无需替换,即所有 0 或 1。但是,当我将两者都设置为非默认值我得到一些权重高于 1 和一些在 0 和 1 之间,这是我与替换相关联的分布。
我是不是误解了有替换采样和无替换采样之间的区别,还是这种情况覆盖了 replace=F 参数?我已经阅读了包文档,但很可能我错过或误解了解释这一点的部分。如果是这样,请随时将我引导至相关部分!
一个(希望)可重现的例子:
set.seed(42)
DF<-data.frame(Group=factor(c(rep("Treatment",40),rep("Control",360))),
mVar1=factor(c(sample(LETTERS[c(1,1,1,2)],40,replace=T),sample(LETTERS[c(1,2)],360,replace=T))),
mVar2=factor(c(sample(LETTERS[c(3,3,4,4,4,5)],40,replace=T),sample(LETTERS[c(3,4,5)],360,replace=T))),
mVar3=c(rpois(40,3),rpois(360,1)))
str(DF)
(m1<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",ratio=3,replace=F))
plot(m1,type="jitter",interactive=FALSE)
hist(m1$weights)
(m2<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",caliper=0.1,replace=F))
plot(m2,type="jitter",interactive=FALSE)
hist(m2$weights)
(m3<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",ratio=3,caliper=0.1,replace=F))
plot(m3,"jitter",interactive=FALSE)
hist(m3$weights)
谢谢!
匹配权重是使用 ?matchit
中描述的公式计算的。匹配 without 替换时使用此公式,正如您所做的那样。公式如下:
Each unit is assigned to a subclass, which represents the pair they
are a part of (in the case of k:1 matching) or the stratum they belong
to (in the case of exact matching, coarsened exact matching, full
matching, or subclassification). The formula for computing the weights
depends on the argument supplied to estimand. A new stratum
"propensity score" (p) is computed as the proportion of units in each
stratum that are in the treated group, and all units in that stratum
are assigned that propensity score. Weights are then computed using
the standard formulas for inverse probability weights: for the ATT,
weights are 1 for the treated units and p/(1-p) for the control units;
for the ATC, weights are (1-p)/p for the treated units and 1 for the
control units; for the ATE, weights are 1/p for the treated units and
1/(1-p) for the control units.
...
In each treatment group, weights are divided by the mean of the
nonzero weights in that treatment group to make the weights sum to the
number of units in that treatment group.
当使用恒定匹配率时(例如,每个处理过的单元获得 1 个匹配项或每个处理过的单元获得 3 个匹配项),所有控制单元的权重将保持不变。否则,权重将因控制单元而异。您看到的是控制单元的重量变化。刚好有放回匹配也会出现这种情况,但是变比匹配或者全匹配,无放回匹配也会出现这种情况。
要查看您是否真正匹配替换,运行 table(table(m3$match.matrix))
。 table(m3$match.matrix)
告诉您每个控制单元被用作匹配项的次数,并且 运行ning table()
在该输出上告诉您每个控制单元被使用了多少次。你会看到每个控制单元只被使用一次,所以table(table())
的输出中只会有一个条目,说明匹配没有替换
我正在使用 R 中的 MatchIt 包来执行倾向得分匹配。匹配方法是最近邻,距离度量是glm。当我将 ratio 和 caliper 设置为默认值时,或者如果我只为其中一个输入非默认值,我会得到我期望的权重分布而无需替换,即所有 0 或 1。但是,当我将两者都设置为非默认值我得到一些权重高于 1 和一些在 0 和 1 之间,这是我与替换相关联的分布。
我是不是误解了有替换采样和无替换采样之间的区别,还是这种情况覆盖了 replace=F 参数?我已经阅读了包文档,但很可能我错过或误解了解释这一点的部分。如果是这样,请随时将我引导至相关部分!
一个(希望)可重现的例子:
set.seed(42)
DF<-data.frame(Group=factor(c(rep("Treatment",40),rep("Control",360))),
mVar1=factor(c(sample(LETTERS[c(1,1,1,2)],40,replace=T),sample(LETTERS[c(1,2)],360,replace=T))),
mVar2=factor(c(sample(LETTERS[c(3,3,4,4,4,5)],40,replace=T),sample(LETTERS[c(3,4,5)],360,replace=T))),
mVar3=c(rpois(40,3),rpois(360,1)))
str(DF)
(m1<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",ratio=3,replace=F))
plot(m1,type="jitter",interactive=FALSE)
hist(m1$weights)
(m2<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",caliper=0.1,replace=F))
plot(m2,type="jitter",interactive=FALSE)
hist(m2$weights)
(m3<-matchit(Group~mVar1+mVar2+mVar3,data=DF,method="nearest",distance="glm",ratio=3,caliper=0.1,replace=F))
plot(m3,"jitter",interactive=FALSE)
hist(m3$weights)
谢谢!
匹配权重是使用 ?matchit
中描述的公式计算的。匹配 without 替换时使用此公式,正如您所做的那样。公式如下:
Each unit is assigned to a subclass, which represents the pair they are a part of (in the case of k:1 matching) or the stratum they belong to (in the case of exact matching, coarsened exact matching, full matching, or subclassification). The formula for computing the weights depends on the argument supplied to estimand. A new stratum "propensity score" (p) is computed as the proportion of units in each stratum that are in the treated group, and all units in that stratum are assigned that propensity score. Weights are then computed using the standard formulas for inverse probability weights: for the ATT, weights are 1 for the treated units and p/(1-p) for the control units; for the ATC, weights are (1-p)/p for the treated units and 1 for the control units; for the ATE, weights are 1/p for the treated units and 1/(1-p) for the control units.
...
In each treatment group, weights are divided by the mean of the nonzero weights in that treatment group to make the weights sum to the number of units in that treatment group.
当使用恒定匹配率时(例如,每个处理过的单元获得 1 个匹配项或每个处理过的单元获得 3 个匹配项),所有控制单元的权重将保持不变。否则,权重将因控制单元而异。您看到的是控制单元的重量变化。刚好有放回匹配也会出现这种情况,但是变比匹配或者全匹配,无放回匹配也会出现这种情况。
要查看您是否真正匹配替换,运行 table(table(m3$match.matrix))
。 table(m3$match.matrix)
告诉您每个控制单元被用作匹配项的次数,并且 运行ning table()
在该输出上告诉您每个控制单元被使用了多少次。你会看到每个控制单元只被使用一次,所以table(table())
的输出中只会有一个条目,说明匹配没有替换