使用两个变量和多个行名称获取 p 值

Question

如果你能帮助我从这个简单的 data.frame 中测量 p 值，我很困惑。我的数据框称为 (my_data)。通过查看它，您可以看到我正在比较的相似值：

my_data <- read.csv("densityleftOK.csv", stringsAsFactors = FALSE [c(1,2,3),]

      P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
A     1008 1425 869 1205 954  797 722 471 435 628 925
B      550  443 317  477 337  383  54 111  27 239 379
C      483  574 597  375 593  553 249 325 238 354 411

因此，我想通过比较安慰剂与处理过的样本来为每一行获得一个 pvalue。如果您不介意，我还想知道安慰剂 (P) 和治疗组 (T) 之间的标准差。

感谢您的帮助。谢谢

Answer 1

您可以尝试类似下面的方法，将数据转换为长格式，按 id 分组，引入分组向量（"P" 或 "T"）并在 [=18 上使用 tidy =] 将其包装成 table 格式：

library(broom)
library(tidyr)
library(dplyr)
library(tibble)

data = read.table(text="P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
A     1008 1425 869 1205 954  797 722 471 435 628 925
B      550  443 317  477 337  383  54 111  27 239 379
C      483  574 597  375 593  553 249 325 238 354 411",header=TRUE,row.names=1)

res = data %>% 
rownames_to_column("id") %>% 
pivot_longer(-id) %>% 
mutate(grp=sub("[0-9]","",name)) %>% 
group_by(id) %>% 
do(tidy(t.test(value ~ grp,data=.))) %>%
select(c(id,estimate,estimate1,estimate2,statistic,p.value)) %>%
mutate(stderr = estimate/statistic)

# A tibble: 3 x 7
# Groups:   id [3]
  id    estimate estimate1 estimate2 statistic p.value stderr
  <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>  <dbl>
1 A         429.     1092.      663       3.40 0.00950  126. 
2 B         226.      425.      199.      2.89 0.0192    78.2
3 C         169.      524.      355       2.65 0.0266    64.0

如果你不使用包..那就是使用 apply 的问题了，我想预先声明组更容易:

grp = gsub("[0-9]","",colnames(data))

res = apply(data,1,function(i){
data.frame(t.test(i~grp)[c("statistic","p.value","stderr")])
})

res = do.call(rbind,res)
  statistic     p.value    stderr
A  3.395303 0.009498631 126.40994
B  2.890838 0.019173060  78.16650
C  2.646953 0.026608838  63.99812

使用两个变量和多个行名称获取 p 值

Get p-value with two variables and multiple row names

statistics

r

p-value

hypothesis-test

t-test