确定R中95%CI的两组中位数的差异(不是差异的中位数)
Determine the difference between the medians of two groups with 95% CI in R (not the median of the differences)
我有关于 49 名患者连续结果和该结果基线评分变化的数据。此外,我还根据基线分数中位数将患者分为低基线分数 (Q1) 或高基线分数 (Q2)。此数据如下所示:
library(boot)
mydata <-
structure(
list(
ID=c(4, 13, 20, 24, 30, 34, 37, 38, 48, 49, 51, 52, 54, 58, 75, 80, 81, 82, 83, 84, 92, 95, 103, 104, 115,
117, 125, 127, 138, 141, 153, 160, 172, 180, 185, 197, 198, 202, 205, 213, 221, 253, 255, 258, 262,
271, 277, 279, 320),
change_continuous_outcome = c(694, 52, 1500, 195, 53, 54, -500, 2, -21, 394, -10, -38, 43, 1500,
-500, -11, 8, 149, 0, 473, 8, 797, 313, 9, 263, 1219, 68, 216,
75, 0, 95, 698, -1, 750, 168, 251, -381, 19, 70, 0, 182, 4, -28,
36, 37, 18, 3, 928, -4),
baseline_continuous_outcome = c(2646.8, 3112.4, 10661.6, 5706.7, 81.5, 3730.4, 196.1, 83.9, 177.3, 1976.7,
3196.8, 2007.5, 63.2, 7594.5, 3261.8, 155.2, 57.2, 11189.7, 0,
2800.8, 13.9, 3484.5, 3528.1, 3636.6, 9.1, 5681.4, 67.9, 205.4, 138.4,
3141.1, 138.5, 3795.9, 152.7, 7349.1, 2123.4, 122, 5935.8, 100.7,
2023.4, 4095.4, 2636.1, 11.9, 2241.1, 198.2, 186, 20.2, 97.7, 6709.8, 169.5),
q2vsq1_baseline_cont_outcome = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 1L ), .Label = c("0", "1"), class = "factor")),
row.names = c(NA, -49L),
class = c("tbl_df", "tbl", "data.frame"))
我执行了 Wilcoxon 秩和检验来比较基线得分低和基线得分高的患者之间的 continuous_outcome_change
变量:
wilcox.test(mydata$change_continuous_outcome ~ mydata$q2vsq1_baseline_cont_outcome)
Wilcoxon rank sum test with continuity correction
data: mydata$change_continuous_outcome by mydata$q2vsq1_baseline_cont_outcome
W = 201.5, p-value = 0.04995
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(53, -500, 2, -21, 394, 43, -11, 8, :
cannot compute exact p-value with ties
现在我有兴趣计算组的两个中位数之间的差异,包括 95% 的置信区间。我想使用 boot
函数来执行此操作,它有两个参数:一个用于数据,一个用于索引数据。所以我需要编写一个函数来索引我的 data/calculates 组之间的中位数。借用我在别处找到的东西 (https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/) 我做了:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"])
}
boot_result <- boot(data=mydata, statistic=med.diff, R=1000)
median(boot_result$t)
boot.ci(boot_result, type = "perc")
但是这个 returns NA 结果。我的公式有问题吗?或者是其他地方的问题?
提前致谢!
据我所知,您收到的错误来自以下行:
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"])
这是NA。当定义基线计数结果的数据结构时,您将其转换为一个因子,但重新标记了它。所以整数 1 和 2 看起来在数据框中被重新标记为 0 和 1。然后您在该列 returns NA 中搜索 "2"
的值,因为它不存在。如果您将函数更改为:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="0"])
}
你得到:
median(boot_result$t)
> 143
boot.ci(boot_result, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot_result, type = "perc")
Intervals :
Level Percentile
95% ( -1.0, 579.4 )
Calculations and Intervals on Original Scale
我有关于 49 名患者连续结果和该结果基线评分变化的数据。此外,我还根据基线分数中位数将患者分为低基线分数 (Q1) 或高基线分数 (Q2)。此数据如下所示:
library(boot)
mydata <-
structure(
list(
ID=c(4, 13, 20, 24, 30, 34, 37, 38, 48, 49, 51, 52, 54, 58, 75, 80, 81, 82, 83, 84, 92, 95, 103, 104, 115,
117, 125, 127, 138, 141, 153, 160, 172, 180, 185, 197, 198, 202, 205, 213, 221, 253, 255, 258, 262,
271, 277, 279, 320),
change_continuous_outcome = c(694, 52, 1500, 195, 53, 54, -500, 2, -21, 394, -10, -38, 43, 1500,
-500, -11, 8, 149, 0, 473, 8, 797, 313, 9, 263, 1219, 68, 216,
75, 0, 95, 698, -1, 750, 168, 251, -381, 19, 70, 0, 182, 4, -28,
36, 37, 18, 3, 928, -4),
baseline_continuous_outcome = c(2646.8, 3112.4, 10661.6, 5706.7, 81.5, 3730.4, 196.1, 83.9, 177.3, 1976.7,
3196.8, 2007.5, 63.2, 7594.5, 3261.8, 155.2, 57.2, 11189.7, 0,
2800.8, 13.9, 3484.5, 3528.1, 3636.6, 9.1, 5681.4, 67.9, 205.4, 138.4,
3141.1, 138.5, 3795.9, 152.7, 7349.1, 2123.4, 122, 5935.8, 100.7,
2023.4, 4095.4, 2636.1, 11.9, 2241.1, 198.2, 186, 20.2, 97.7, 6709.8, 169.5),
q2vsq1_baseline_cont_outcome = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 1L ), .Label = c("0", "1"), class = "factor")),
row.names = c(NA, -49L),
class = c("tbl_df", "tbl", "data.frame"))
我执行了 Wilcoxon 秩和检验来比较基线得分低和基线得分高的患者之间的 continuous_outcome_change
变量:
wilcox.test(mydata$change_continuous_outcome ~ mydata$q2vsq1_baseline_cont_outcome)
Wilcoxon rank sum test with continuity correction
data: mydata$change_continuous_outcome by mydata$q2vsq1_baseline_cont_outcome
W = 201.5, p-value = 0.04995
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(53, -500, 2, -21, 394, 43, -11, 8, :
cannot compute exact p-value with ties
现在我有兴趣计算组的两个中位数之间的差异,包括 95% 的置信区间。我想使用 boot
函数来执行此操作,它有两个参数:一个用于数据,一个用于索引数据。所以我需要编写一个函数来索引我的 data/calculates 组之间的中位数。借用我在别处找到的东西 (https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/) 我做了:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"])
}
boot_result <- boot(data=mydata, statistic=med.diff, R=1000)
median(boot_result$t)
boot.ci(boot_result, type = "perc")
但是这个 returns NA 结果。我的公式有问题吗?或者是其他地方的问题? 提前致谢!
据我所知,您收到的错误来自以下行:
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"])
这是NA。当定义基线计数结果的数据结构时,您将其转换为一个因子,但重新标记了它。所以整数 1 和 2 看起来在数据框中被重新标记为 0 和 1。然后您在该列 returns NA 中搜索 "2"
的值,因为它不存在。如果您将函数更改为:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="0"])
}
你得到:
median(boot_result$t)
> 143
boot.ci(boot_result, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot_result, type = "perc")
Intervals :
Level Percentile
95% ( -1.0, 579.4 )
Calculations and Intervals on Original Scale