对 R 中数据框中存在的所有变量组合应用素食函数

Applying a vegan function across all combinations of variables present in data frame in R

我正在尝试 运行 Bray Curtis 在志愿者和专业生物学家的观察结果之间的差异。我有一个正在处理的数据集“tidy”,以及一个我想将结果添加到的结果数据集“BCresult.ex”。对于“工作簿”、“位置”和“方法”的每个独特的现有组合,我想将“Vol”观察行与“Bio”观察行进行比较。 Workbook/Location/Method 的每个唯一现有组合对应于在站点采集的样本。

我已经使用“过滤器”成功地对一个 Workbook/Location/Method 组合进行了子集化,并比较了“Vol”和“Bio”行之间的观察结果,然后将结果附加到“[=33] 的最后一列=]”。我想弄清楚如何将此功能应用于“整洁”中存在的所有组合。

“整洁”

# A tibble: 24 × 7
   Workbook Location   Method Observer  Worm Larvae Swimmer
   <chr>    <chr>      <chr>  <chr>    <dbl>  <dbl>   <dbl>
 1 A        BigCreek   K      Vol          4      4       1
 2 A        BigCreek   K      Bio          4      4       1
 3 A        BigCreek   L      Vol          2      3       2
 4 A        BigCreek   L      Bio          5      2       3
 5 A        BigCreek   V      Vol          5      2       3
 6 A        BigCreek   V      Bio          3      3       3
 7 A        SmallCreek K      Vol          1      1       2
 8 A        SmallCreek K      Bio          2      4       2
 9 A        SmallCreek L      Vol          2      4       3
10 A        SmallCreek L      Bio          4      2       4
# … with 14 more rows

BCresult.ex

   Workbook   Location Method Score Bray
1         A   BigCreek      K     2   NA
2         A   BigCreek      L     3   NA
3         A   BigCreek      V     1   NA
4         A SmallCreek      K     3   NA
5         A SmallCreek      L     1   NA
6         A SmallCreek      V     2   NA
7         B   BigCreek      K     1   NA
8         B   BigCreek      L     1   NA
9         B   BigCreek      V     1   NA
10        B SmallCreek      K     1   NA
11        B SmallCreek      L     3   NA
12        B SmallCreek      V     3   NA

我可以运行这个:

Observation <- filter(tidy, Workbook == "A" & 
                         Location == "BigCreek" &
                         Method == "K")

BrayC = vegdist(Observation[,5:7], "bray") 
BrayC 

BCresult.ex %>%
  mutate(Bray = BrayC)

制作这个:

   Workbook   Location Method Score Bray
1         A   BigCreek      K     2    0
2         A   BigCreek      L     3    0
3         A   BigCreek      V     1    0
4         A SmallCreek      K     3    0
5         A SmallCreek      L     1    0
6         A SmallCreek      V     2    0
7         B   BigCreek      K     1    0
8         B   BigCreek      L     1    0
9         B   BigCreek      V     1    0
10        B SmallCreek      K     1    0
11        B SmallCreek      L     3    0
12        B SmallCreek      V     3    0

理想情况下,一旦我弄清楚如何循环,我就会得到: 我不确定如何设置循环。我是否列出所有变量并循环遍历它?或者列出所有现有组合并使用它?真实数据集并没有所有可能的组合。我是否只是指向一个位置(每两行)并这样做?

   Workbook   Location Method Score       Bray
1         A   BigCreek      K     2 0.00000000
2         A   BigCreek      L     3 0.29411760
3         A   BigCreek      V     1 0.15789470
4         A SmallCreek      K     3 0.33333330
5         A SmallCreek      L     1 0.26315790
6         A SmallCreek      V     2 0.30000000
7         B   BigCreek      K     1 0.30000000
8         B   BigCreek      L     1 0.18181820
9         B   BigCreek      V     1 0.14285710
10        B SmallCreek      K     1 0.44444440
11        B SmallCreek      L     3 0.08333333
12        B SmallCreek      V     3 0.17647060

哇哦,你终于做到了!我希望这是清楚的,新的问题提问者和新的 R 用户。非常感谢。

library(tidyverse)
library(vegan)

tidy = read.table(
  header = TRUE,text="
Workbook Location   Method Observer  Worm Larvae Swimmer
 A        BigCreek   K      Vol          4      4       1
 A        BigCreek   K      Bio          4      4       1
 A        BigCreek   L      Vol          2      3       2
 A        BigCreek   L      Bio          5      2       3
 A        BigCreek   V      Vol          5      2       3
 A        BigCreek   V      Bio          3      3       3
 A        SmallCreek K      Vol          1      1       2
 A        SmallCreek K      Bio          2      4       2
 A        SmallCreek L      Vol          2      4       3
 A        SmallCreek L      Bio          4      2       4
 B        BigCreek   K      Vol          4      4       1
 B        BigCreek   K      Bio          4      4       1
 B        BigCreek   L      Vol          2      3       2
 B        BigCreek   L      Bio          5      2       3
 B        BigCreek   V      Vol          5      2       3
 B        BigCreek   V      Bio          3      3       3
 B        SmallCreek K      Vol          1      1       2
 B        SmallCreek K      Bio          2      4       2
 B        SmallCreek L      Vol          2      4       3
 B        SmallCreek L      Bio          4      2       4  
  ") %>% 
  as_tibble() %>% 
  mutate(
    Workbook = Workbook %>% fct_inorder(),
    Location = Location %>% fct_inorder(),
    Method = Method  %>% fct_inorder(),
    Observer = Observer %>% fct_inorder()
  )

BResult.ex = read.table(
  header = TRUE,text="
  Workbook   Location Method Score Bray
         A   BigCreek      K     2   NA
         A   BigCreek      L     3   NA
         A   BigCreek      V     1   NA
         A SmallCreek      K     3   NA
         A SmallCreek      L     1   NA
         A SmallCreek      V     2   NA
         B   BigCreek      K     1   NA
         B   BigCreek      L     1   NA
         B   BigCreek      V     1   NA
         B SmallCreek      K     1   NA
         B SmallCreek      L     3   NA
         B SmallCreek      V     3   NA
  ") %>% 
  as_tibble() %>% 
  mutate(
    Workbook = Workbook %>% fct_inorder(),
    Location = Location %>% fct_inorder(),
    Method = Method  %>% fct_inorder()
  )

fVeg = function(data){
  xveg = data %>% select(Worm:Swimmer)
  if(any(is.na(xveg$Worm), is.na(xveg$Larvae), is.na(xveg$Swimmer))) {
    data %>% mutate(Bray = rep(NA, nrow(xveg)))
  } else {
    data %>% mutate(Bray = rep(vegdist(xveg, "bray"), nrow(xveg)))
  }
} 
  

  
BResult.ex %>% left_join(
  tidy, by = c("Workbook", "Location", "Method")
) %>% group_by(Workbook, Location, Method) %>% 
  group_modify(~fVeg(.x))

输出

# A tibble: 22 x 9
# Groups:   Workbook, Location, Method [12]
   Workbook Location   Method Score  Bray Observer  Worm Larvae Swimmer
   <fct>    <fct>      <fct>  <int> <dbl> <fct>    <int>  <int>   <int>
 1 A        BigCreek   K          2 0     Vol          4      4       1
 2 A        BigCreek   K          2 0     Bio          4      4       1
 3 A        BigCreek   L          3 0.294 Vol          2      3       2
 4 A        BigCreek   L          3 0.294 Bio          5      2       3
 5 A        BigCreek   V          1 0.158 Vol          5      2       3
 6 A        BigCreek   V          1 0.158 Bio          3      3       3
 7 A        SmallCreek K          3 0.333 Vol          1      1       2
 8 A        SmallCreek K          3 0.333 Bio          2      4       2
 9 A        SmallCreek L          1 0.263 Vol          2      4       3
10 A        SmallCreek L          1 0.263 Bio          4      2       4
# ... with 12 more rows

我和我的导师最终使用了这个 for 循环,因为他更熟悉这个循环。它产生了我在这个应用程序和我的真实数据集上所追求的结果。

for(i in 1:length(BCresult.ex$Workbook))
{
  
  workbook.temp.ex <- BCresult.ex[i, "Workbook"]
  location.temp.ex <- BCresult.ex[i, "Location"]
  method.temp.ex <- BCresult.ex[i, "Method"]
  
  BCdata.temp.ex <- tidy[tidy$Workbook == workbook.temp.ex &
                          tidy$Location == location.temp.ex &
                          tidy$Method == method.temp.ex,
                        5:7]
  
  BCresult.ex[i, "Bray"]  <- vegdist(BCdata.temp.ex,'bray')
  # test  <- as.numeric(vegdist(BCdata.temp,'bray'))
  
}

结果:

> BCresult.ex
   Workbook   Location Method Score               Bray
1         A   BigCreek      K     2                  0
2         A   BigCreek      L     3  0.294117647058824
3         A   BigCreek      V     1  0.157894736842105
4         A SmallCreek      K     3  0.333333333333333
5         A SmallCreek      L     1  0.263157894736842
6         A SmallCreek      V     2                0.3
7         B   BigCreek      K     1                0.3
8         B   BigCreek      L     1  0.181818181818182
9         B   BigCreek      V     1  0.142857142857143
10        B SmallCreek      K     1  0.444444444444444
11        B SmallCreek      L     3 0.0833333333333333
12        B SmallCreek      V     3  0.176470588235294

唯一的问题:关于 vegan 中的“bray”函数 returns Bray-Curtis 值作为字符,而不是数字。我们只是导出为 .csv,然后将该 .csv 重新加载到 R 中以解决这个问题。使用 as.numeric 似乎没有用。