如何计算由不同组控制的排列测试?
How to compute a permutation test controlled for by a different group?
运行下面的代码生成这个问题的数据集。
dt <- data.table(
classA = rep(c("C", "D"), 8),
classB = rep(c("A", "B"), each = 8),
priceA = c(0.5, 0.6, 1.1, 1.1, 0.6, 0.6, -0.2, 1.3, -0.4, 3, -0.4, 2.3, -0.2, 1.8, 0.4, 0.4)
)
我们应用置换测试来测试变量 classA 和变量 price 之间是否存在关联。我们提供以下代码:
T_obs <- dt[, mean(priceA), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[, price_sample := sample(priceA)]
dt[, mean(price_sample), by = classA][, diff(V1)]
})
p_val_right<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
p_val_left<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval<-2*min(pval_l, p_val_r, na.rm = TRUE)
- 测试关联是否会被 classB 混淆。要控制 groupB,请修改代码。提供 R 代码计算 T_star 和 T_obs 以第三个变量 groupB 为条件。作为总体测试统计数据,使用 p_value 变量 groupB 定义的组的平均值。
我该如何解决这个问题?网上只能找到更高级的代码,但我们需要更简单地解决它。
我尝试使用此代码:
T_obs <- dt[classB=="C", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="C", price_sampled := sample(price)]
dt[classB=="C", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval_groupC<-2*min(pval_l, p_val_r, na.rm = TRUE)
T_obs <- dt[classB=="D", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="D", price_sampled := sample(price)]
dt[classB=="D", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pvalgroupD<-2*min(pval_l, p_val_r, na.rm = TRUE)
pval<- (pvalgroupD+pval_groupC)/2
pval
这是正确的吗?
运行下面的代码生成这个问题的数据集。
dt <- data.table(
classA = rep(c("C", "D"), 8),
classB = rep(c("A", "B"), each = 8),
priceA = c(0.5, 0.6, 1.1, 1.1, 0.6, 0.6, -0.2, 1.3, -0.4, 3, -0.4, 2.3, -0.2, 1.8, 0.4, 0.4)
)
我们应用置换测试来测试变量 classA 和变量 price 之间是否存在关联。我们提供以下代码:
T_obs <- dt[, mean(priceA), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[, price_sample := sample(priceA)]
dt[, mean(price_sample), by = classA][, diff(V1)]
})
p_val_right<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
p_val_left<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval<-2*min(pval_l, p_val_r, na.rm = TRUE)
- 测试关联是否会被 classB 混淆。要控制 groupB,请修改代码。提供 R 代码计算 T_star 和 T_obs 以第三个变量 groupB 为条件。作为总体测试统计数据,使用 p_value 变量 groupB 定义的组的平均值。
我该如何解决这个问题?网上只能找到更高级的代码,但我们需要更简单地解决它。
我尝试使用此代码:
T_obs <- dt[classB=="C", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="C", price_sampled := sample(price)]
dt[classB=="C", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval_groupC<-2*min(pval_l, p_val_r, na.rm = TRUE)
T_obs <- dt[classB=="D", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="D", price_sampled := sample(price)]
dt[classB=="D", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pvalgroupD<-2*min(pval_l, p_val_r, na.rm = TRUE)
pval<- (pvalgroupD+pval_groupC)/2
pval
这是正确的吗?