与 Matchit 的精确年龄匹配不起作用。
Exact age matched match with Matchit doesn't work.
我正在尝试按年龄执行病例对照精确匹配。
我的数据库由 75 名患者的 139 只眼睛组成,按二分法变量 (G6PDcarente= 0/1) 分为 2 组。
我正在尝试使用以下代码进行匹配:
match.it <- matchit(G6PDcarente~age, data = newdata, method="exact",ratio=1,replace=FALSE)
match.it
问题是结果是:
Exact Subclasses: 14
Sample sizes:
Control Treated
All 43 85
Matched 31 42
Unmatched 12 43
为什么匹配对的样本大小如此不同?
对照和治疗匹配样本(例如:31-31)不应该相同吗?
如何在两组中获得具有相同样本量的年龄完全匹配?
我也试过密码:
match.it <- matchit(G6PDcarente~age, data = newdata, method="nearest",exact="age",ratio=1, replace=FALSE)
但是我有以下错误信息:
Error in Ops.data.frame(exact[itert, k], exact[clabels, k]) :
‘!=’ only defined for equally-sized data frames
Inoltre: Warning message:
In matchit2nearest(c(`1` = 0, `2` = 0, `3` = 0, `4` = 0, `5` = 0, :
Fewer control than treated units and matching without replacement. Not all treated units will receive a match. Treated units will be matched in the order specified by m.order: largest
有人可以帮助我吗?
谢谢
这是重现我的数据样本的代码:
newdata <- structure(list(NumeroProgressivo = c(43, 44, 137, 138, 129, 130,
65, 111, 148, 149, 35, 36, 83, 84, 37, 38, 127, 128, 160, 161,
75, 76, 53, 54, 119, 120, 109, 110, 57, 58, 39, 51, 52, 29, 30,
71, 72, 154, 155, 77, 78, 1, 2, 61, 62, 158, 101, 102, 27, 28,
73, 103, 104, 121, 122, 152, 153, 107, 108, 45, 46, 81, 82, 139,
140, 59, 60, 95, 96, 33, 34, 91, 92, 26, 49, 50, 79, 6, 63, 64,
15, 16, 31, 32, 143, 144, 69, 70, 89, 90, 41, 42, 17, 18, 67,
68, 115, 116, 150, 151, 97, 98, 93, 94, 135, 136, 55, 56, 131,
132, 162, 163, 21, 22, 23, 24, 156, 157, 133, 166, 174, 175,
164, 165, 172, 173, 176, 177), IDpaziente = c(22, 22, 67, 67,
63, 63, 33, 56, 73, 73, 18, 18, 42, 42, 19, 19, 62, 62, 79, 79,
38, 38, 27, 27, 60, 60, 55, 55, 29, 29, 20, 26, 26, 15, 15, 36,
36, 76, 76, 39, 39, 1, 1, 31, 31, 78, 51, 51, 14, 14, 37, 52,
52, 61, 61, 75, 75, 54, 54, 23, 23, 41, 41, 68, 68, 30, 30, 48,
48, 17, 17, 46, 46, 13, 25, 25, 40, 3, 32, 32, 8, 8, 16, 16,
70, 70, 35, 35, 45, 45, 21, 21, 9, 9, 34, 34, 58, 58, 74, 74,
49, 49, 47, 47, 66, 66, 28, 28, 64, 64, 80, 80, 11, 11, 12, 12,
77, 77, 65, 82, 86, 86, 81, 81, 85, 85, 87, 87), Occhio = c("OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS"), G6PDcarente = c(0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
age = c(70, 70, 38, 38, 54, 54, 41, 74, 31, 31, 27, 27, 36,
36, 36, 36, 49, 49, 34, 34, 49, 49, 34, 34, 33, 33, 34, 34,
38, 38, 62, 30, 30, 38, 38, 53, 53, 27, 27, 57, 57, 84, 84,
25, 25, 26, 57, 57, 47, 47, 29, 31, 31, 26, 26, 23, 23, 34,
34, 48, 48, 34, 34, 34, 34, 40, 40, 45, 45, 33, 33, 61, 61,
73, 32, 32, 67, 80, 39, 39, 67, 67, 37, 37, 28, 28, 26, 26,
32, 32, 24, 24, 61, 61, 36, 36, 66, 66, 26, 26, 35, 35, 39,
39, 32, 32, 39, 39, 39, 39, 42, 42, 35, 35, 64, 64, 34, 34,
37, 61, 80, 80, 74, 74, 62, 62, 71, 71)), row.names = c(NA,
-128L), class = c("tbl_df", "tbl", "data.frame"))
分配给对照组/治疗组的观察值数量正是它们应该的数量,因为分配是基于 G6PDcarente 变量中的值。
来自帮助文件?matchit
:
(For the first argument in the function, formula
) This argument
takes the usual syntax of R formula, treat ~ x1 + x2
, where treat
is a binary treatment indicator and x1
and x2
are the
pre-treatment covariates.
在您的情况下,公式对应于 G6PDcarente~age
,并且 G6PDcarente == 1
与 G6PDcarente == 0
.
处的观测值数量不同
我们可以直接通过人工检查来验证,因为数据集不是很大:
library(dplyr)
library(tidyr)
new.data.check <- newdata %>%
count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
na.omit() # remove NA rows, where a specific age only has G6PDCarente == 0
# OR G6PDCarente == 1, but not both (i.e. unmatched samples)
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 4
2 27 2 2
3 31 2 2
4 32 2 4
5 34 6 8
6 37 1 2
7 38 2 4
8 39 2 6
9 49 2 2
10 61 1 4
11 62 2 1
12 67 2 1
13 74 2 1
14 80 2 1
对于同时具有 G6PDcarente == 0
和 == 1
的年龄值,有 31 个观测值 G6PDcarente == 0
和 42 个观测值 G6PDcarente == 1
:
> colSums(new.data.check)
age 0 1
657 31 42
不知道你的确切用例,我想如果你真的想要相同的治疗与控制数字,你总是可以放弃一些观察...
感谢@Z.Lin 的回复,我找到了解决问题的方法。
这里是我按照 tutorial 的说明使用的代码:
OCTA.Filtered = as.data.frame(na.omit(OCTA.Filtered))
m.out.test = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, ratio = 1)
test_data = match.data(m.out.test)
ps.sd = sd(test_data$distance)
# matching is performed below using propensity scores given the covariates mentioned below
# caliper = 0.25 times sd of propensity scores (optimal)
m.out = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, caliper = 0.25*ps.sd)
# check the sample sizes (below)
m.out
# Final matched data saved as final_data
final_data = match.data(m.out)
# (here distance = propensity score)
new.data.check <- final_data %>%
+ count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
+ spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
+ na.omit()
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 3
2 27 2 2
3 31 2 2
4 32 2 2
5 34 6 6
6 37 1 1
7 38 2 2
8 39 2 2
9 49 2 2
10 61 1 1
11 62 1 1
12 67 1 1
13 74 1 1
14 80 1 1
我正在尝试按年龄执行病例对照精确匹配。 我的数据库由 75 名患者的 139 只眼睛组成,按二分法变量 (G6PDcarente= 0/1) 分为 2 组。
我正在尝试使用以下代码进行匹配:
match.it <- matchit(G6PDcarente~age, data = newdata, method="exact",ratio=1,replace=FALSE)
match.it
问题是结果是:
Exact Subclasses: 14
Sample sizes:
Control Treated
All 43 85
Matched 31 42
Unmatched 12 43
为什么匹配对的样本大小如此不同? 对照和治疗匹配样本(例如:31-31)不应该相同吗? 如何在两组中获得具有相同样本量的年龄完全匹配?
我也试过密码:
match.it <- matchit(G6PDcarente~age, data = newdata, method="nearest",exact="age",ratio=1, replace=FALSE)
但是我有以下错误信息:
Error in Ops.data.frame(exact[itert, k], exact[clabels, k]) :
‘!=’ only defined for equally-sized data frames
Inoltre: Warning message:
In matchit2nearest(c(`1` = 0, `2` = 0, `3` = 0, `4` = 0, `5` = 0, :
Fewer control than treated units and matching without replacement. Not all treated units will receive a match. Treated units will be matched in the order specified by m.order: largest
有人可以帮助我吗?
谢谢
这是重现我的数据样本的代码:
newdata <- structure(list(NumeroProgressivo = c(43, 44, 137, 138, 129, 130,
65, 111, 148, 149, 35, 36, 83, 84, 37, 38, 127, 128, 160, 161,
75, 76, 53, 54, 119, 120, 109, 110, 57, 58, 39, 51, 52, 29, 30,
71, 72, 154, 155, 77, 78, 1, 2, 61, 62, 158, 101, 102, 27, 28,
73, 103, 104, 121, 122, 152, 153, 107, 108, 45, 46, 81, 82, 139,
140, 59, 60, 95, 96, 33, 34, 91, 92, 26, 49, 50, 79, 6, 63, 64,
15, 16, 31, 32, 143, 144, 69, 70, 89, 90, 41, 42, 17, 18, 67,
68, 115, 116, 150, 151, 97, 98, 93, 94, 135, 136, 55, 56, 131,
132, 162, 163, 21, 22, 23, 24, 156, 157, 133, 166, 174, 175,
164, 165, 172, 173, 176, 177), IDpaziente = c(22, 22, 67, 67,
63, 63, 33, 56, 73, 73, 18, 18, 42, 42, 19, 19, 62, 62, 79, 79,
38, 38, 27, 27, 60, 60, 55, 55, 29, 29, 20, 26, 26, 15, 15, 36,
36, 76, 76, 39, 39, 1, 1, 31, 31, 78, 51, 51, 14, 14, 37, 52,
52, 61, 61, 75, 75, 54, 54, 23, 23, 41, 41, 68, 68, 30, 30, 48,
48, 17, 17, 46, 46, 13, 25, 25, 40, 3, 32, 32, 8, 8, 16, 16,
70, 70, 35, 35, 45, 45, 21, 21, 9, 9, 34, 34, 58, 58, 74, 74,
49, 49, 47, 47, 66, 66, 28, 28, 64, 64, 80, 80, 11, 11, 12, 12,
77, 77, 65, 82, 86, 86, 81, 81, 85, 85, 87, 87), Occhio = c("OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS"), G6PDcarente = c(0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
age = c(70, 70, 38, 38, 54, 54, 41, 74, 31, 31, 27, 27, 36,
36, 36, 36, 49, 49, 34, 34, 49, 49, 34, 34, 33, 33, 34, 34,
38, 38, 62, 30, 30, 38, 38, 53, 53, 27, 27, 57, 57, 84, 84,
25, 25, 26, 57, 57, 47, 47, 29, 31, 31, 26, 26, 23, 23, 34,
34, 48, 48, 34, 34, 34, 34, 40, 40, 45, 45, 33, 33, 61, 61,
73, 32, 32, 67, 80, 39, 39, 67, 67, 37, 37, 28, 28, 26, 26,
32, 32, 24, 24, 61, 61, 36, 36, 66, 66, 26, 26, 35, 35, 39,
39, 32, 32, 39, 39, 39, 39, 42, 42, 35, 35, 64, 64, 34, 34,
37, 61, 80, 80, 74, 74, 62, 62, 71, 71)), row.names = c(NA,
-128L), class = c("tbl_df", "tbl", "data.frame"))
分配给对照组/治疗组的观察值数量正是它们应该的数量,因为分配是基于 G6PDcarente 变量中的值。
来自帮助文件?matchit
:
(For the first argument in the function,
formula
) This argument takes the usual syntax of R formula,treat ~ x1 + x2
, wheretreat
is a binary treatment indicator andx1
andx2
are the pre-treatment covariates.
在您的情况下,公式对应于 G6PDcarente~age
,并且 G6PDcarente == 1
与 G6PDcarente == 0
.
我们可以直接通过人工检查来验证,因为数据集不是很大:
library(dplyr)
library(tidyr)
new.data.check <- newdata %>%
count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
na.omit() # remove NA rows, where a specific age only has G6PDCarente == 0
# OR G6PDCarente == 1, but not both (i.e. unmatched samples)
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 4
2 27 2 2
3 31 2 2
4 32 2 4
5 34 6 8
6 37 1 2
7 38 2 4
8 39 2 6
9 49 2 2
10 61 1 4
11 62 2 1
12 67 2 1
13 74 2 1
14 80 2 1
对于同时具有 G6PDcarente == 0
和 == 1
的年龄值,有 31 个观测值 G6PDcarente == 0
和 42 个观测值 G6PDcarente == 1
:
> colSums(new.data.check)
age 0 1
657 31 42
不知道你的确切用例,我想如果你真的想要相同的治疗与控制数字,你总是可以放弃一些观察...
感谢@Z.Lin 的回复,我找到了解决问题的方法。
这里是我按照 tutorial 的说明使用的代码:
OCTA.Filtered = as.data.frame(na.omit(OCTA.Filtered))
m.out.test = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, ratio = 1)
test_data = match.data(m.out.test)
ps.sd = sd(test_data$distance)
# matching is performed below using propensity scores given the covariates mentioned below
# caliper = 0.25 times sd of propensity scores (optimal)
m.out = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, caliper = 0.25*ps.sd)
# check the sample sizes (below)
m.out
# Final matched data saved as final_data
final_data = match.data(m.out)
# (here distance = propensity score)
new.data.check <- final_data %>%
+ count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
+ spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
+ na.omit()
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 3
2 27 2 2
3 31 2 2
4 32 2 2
5 34 6 6
6 37 1 1
7 38 2 2
8 39 2 2
9 49 2 2
10 61 1 1
11 62 1 1
12 67 1 1
13 74 1 1
14 80 1 1