分析和绘制两个不同长度的数据帧的通用解决方案?

A general solution to analyze and plot two data frames with varying lengths?

你能帮帮我吗?

我正在 R 中编写代码以自动执行多个网络的 空模型分析 。首先,代码将多个 TXT matrices 读入 R。其次,它为每个网络计算拓扑度量。第三,它使用空模型将每个网络随机化 N 次。第四,它为原始矩阵的所有随机版本计算相同的拓扑度量。

在第五步也是最后一步中,想法是 将观察到的分数与随机分数的分布进行比较 。首先,通过简单计算有多少随机分数高于或低于观察到的分数,以估计 P 值。其次,通过将随机分数的分布绘制为密度并添加一条垂直线来显示观察到的分数。

以下是需要分析的 data frames 示例:

networks <- paste("network", rep(1:3), sep = "")
randomizations <- seq(1:10)

observed.ex <- data.frame(network = networks,
                          observed = runif(3, min = 0, max = 1))

randomized.ex <- data.frame(network = sort(rep(networks, 10)),
                            randomization = rep(randomizations, 3),
                            randomized = rnorm(length(networks)*
                                                   length(randomizations),
                                               mean = 0.5, sd = 0.1))

在最终分析的第一步中,代码通过简单计数来估计 P 值。如您所见,我需要为每个网络复制计算调用:

randomized.network1 <- subset(randomized.ex, network == "network1")
sum(randomized.network1$randomized >= observed.ex$observed[1]) /
    length(randomized.network1$randomized)
sum(randomized.network1$randomized <= observed.ex$observed[1]) /
    length(randomized.network1$randomized)

randomized.network2 <- subset(randomized.ex, network == "network2")
sum(randomized.network2$randomized >= observed.ex$observed[2]) /
    length(randomized.network2$randomized)
sum(randomized.network2$randomized <= observed.ex$observed[2]) /
    length(randomized.network2$randomized)

randomized.network3 <- subset(randomized.ex, network == "network3")
sum(randomized.network3$randomized >= observed.ex$observed[3]) /
    length(randomized.network3$randomized)
sum(randomized.network3$randomized <= observed.ex$observed[3]) /
    length(randomized.network3$randomized)

最后分析的第二步,代码制作了密度图。如您所见,我需要为每个网络制作垂直线调用的副本:

ggplot(randomized.ex, aes(randomized)) +
    geom_density() +
    facet_grid(network~.) +
    geom_vline(data=filter(randomized.ex, network == "network1"),
               aes(xintercept = observed.ex$observed[1]), colour = "red") + 
    geom_vline(data=filter(randomized.ex, network == "network2"),
               aes(xintercept = observed.ex$observed[2]), colour = "red") + 
    geom_vline(data=filter(randomized.ex, network == "network3"),
               aes(xintercept = observed.ex$observed[3]), colour = "red") 

有没有办法使这个最终分析更通用,所以不管一开始读了多少网络,它总是做同样的计算和绘图?

非常感谢!

看起来这可以巧妙地包装在一个遍历每个文件的 lapply 循环中。下面的方法对你有用吗?您还可以传入文件名而不是文件数(当前 1:3),并在您的 TXT 矩阵中“读取”第一行。

library(dplyr) #For %>%, group_by, and summarize
output <- lapply(1:3, function(network_num){
  network <- paste0("network", network_num)
  n_randomizations <- 10
  observed.ex <- runif(1)
  randomized.ex <- rnorm(n_randomizations, mean = 0.5, sd = 0.1)

  return(data.frame(network=network, observed=observed.ex, randomized=randomized.ex))
}) %>% do.call(what = rbind)

output %>%
  group_by(network) %>%
  summarize(p_value=mean(observed>=randomized))

ggplot(output) +
  geom_density(aes(randomized)) +
  facet_grid(network~.) +
  geom_vline(aes(xintercept = observed), col="red")