对 R 中数据框中存在的所有变量组合应用素食函数
Applying a vegan function across all combinations of variables present in data frame in R
我正在尝试 运行 Bray Curtis 在志愿者和专业生物学家的观察结果之间的差异。我有一个正在处理的数据集“tidy”,以及一个我想将结果添加到的结果数据集“BCresult.ex”。对于“工作簿”、“位置”和“方法”的每个独特的现有组合,我想将“Vol”观察行与“Bio”观察行进行比较。 Workbook/Location/Method 的每个唯一现有组合对应于在站点采集的样本。
我已经使用“过滤器”成功地对一个 Workbook/Location/Method 组合进行了子集化,并比较了“Vol”和“Bio”行之间的观察结果,然后将结果附加到“[=33] 的最后一列=]”。我想弄清楚如何将此功能应用于“整洁”中存在的所有组合。
“整洁”
# A tibble: 24 × 7
Workbook Location Method Observer Worm Larvae Swimmer
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 A BigCreek K Vol 4 4 1
2 A BigCreek K Bio 4 4 1
3 A BigCreek L Vol 2 3 2
4 A BigCreek L Bio 5 2 3
5 A BigCreek V Vol 5 2 3
6 A BigCreek V Bio 3 3 3
7 A SmallCreek K Vol 1 1 2
8 A SmallCreek K Bio 2 4 2
9 A SmallCreek L Vol 2 4 3
10 A SmallCreek L Bio 4 2 4
# … with 14 more rows
BCresult.ex
Workbook Location Method Score Bray
1 A BigCreek K 2 NA
2 A BigCreek L 3 NA
3 A BigCreek V 1 NA
4 A SmallCreek K 3 NA
5 A SmallCreek L 1 NA
6 A SmallCreek V 2 NA
7 B BigCreek K 1 NA
8 B BigCreek L 1 NA
9 B BigCreek V 1 NA
10 B SmallCreek K 1 NA
11 B SmallCreek L 3 NA
12 B SmallCreek V 3 NA
我可以运行这个:
Observation <- filter(tidy, Workbook == "A" &
Location == "BigCreek" &
Method == "K")
BrayC = vegdist(Observation[,5:7], "bray")
BrayC
BCresult.ex %>%
mutate(Bray = BrayC)
制作这个:
Workbook Location Method Score Bray
1 A BigCreek K 2 0
2 A BigCreek L 3 0
3 A BigCreek V 1 0
4 A SmallCreek K 3 0
5 A SmallCreek L 1 0
6 A SmallCreek V 2 0
7 B BigCreek K 1 0
8 B BigCreek L 1 0
9 B BigCreek V 1 0
10 B SmallCreek K 1 0
11 B SmallCreek L 3 0
12 B SmallCreek V 3 0
理想情况下,一旦我弄清楚如何循环,我就会得到:
我不确定如何设置循环。我是否列出所有变量并循环遍历它?或者列出所有现有组合并使用它?真实数据集并没有所有可能的组合。我是否只是指向一个位置(每两行)并这样做?
Workbook Location Method Score Bray
1 A BigCreek K 2 0.00000000
2 A BigCreek L 3 0.29411760
3 A BigCreek V 1 0.15789470
4 A SmallCreek K 3 0.33333330
5 A SmallCreek L 1 0.26315790
6 A SmallCreek V 2 0.30000000
7 B BigCreek K 1 0.30000000
8 B BigCreek L 1 0.18181820
9 B BigCreek V 1 0.14285710
10 B SmallCreek K 1 0.44444440
11 B SmallCreek L 3 0.08333333
12 B SmallCreek V 3 0.17647060
哇哦,你终于做到了!我希望这是清楚的,新的问题提问者和新的 R 用户。非常感谢。
library(tidyverse)
library(vegan)
tidy = read.table(
header = TRUE,text="
Workbook Location Method Observer Worm Larvae Swimmer
A BigCreek K Vol 4 4 1
A BigCreek K Bio 4 4 1
A BigCreek L Vol 2 3 2
A BigCreek L Bio 5 2 3
A BigCreek V Vol 5 2 3
A BigCreek V Bio 3 3 3
A SmallCreek K Vol 1 1 2
A SmallCreek K Bio 2 4 2
A SmallCreek L Vol 2 4 3
A SmallCreek L Bio 4 2 4
B BigCreek K Vol 4 4 1
B BigCreek K Bio 4 4 1
B BigCreek L Vol 2 3 2
B BigCreek L Bio 5 2 3
B BigCreek V Vol 5 2 3
B BigCreek V Bio 3 3 3
B SmallCreek K Vol 1 1 2
B SmallCreek K Bio 2 4 2
B SmallCreek L Vol 2 4 3
B SmallCreek L Bio 4 2 4
") %>%
as_tibble() %>%
mutate(
Workbook = Workbook %>% fct_inorder(),
Location = Location %>% fct_inorder(),
Method = Method %>% fct_inorder(),
Observer = Observer %>% fct_inorder()
)
BResult.ex = read.table(
header = TRUE,text="
Workbook Location Method Score Bray
A BigCreek K 2 NA
A BigCreek L 3 NA
A BigCreek V 1 NA
A SmallCreek K 3 NA
A SmallCreek L 1 NA
A SmallCreek V 2 NA
B BigCreek K 1 NA
B BigCreek L 1 NA
B BigCreek V 1 NA
B SmallCreek K 1 NA
B SmallCreek L 3 NA
B SmallCreek V 3 NA
") %>%
as_tibble() %>%
mutate(
Workbook = Workbook %>% fct_inorder(),
Location = Location %>% fct_inorder(),
Method = Method %>% fct_inorder()
)
fVeg = function(data){
xveg = data %>% select(Worm:Swimmer)
if(any(is.na(xveg$Worm), is.na(xveg$Larvae), is.na(xveg$Swimmer))) {
data %>% mutate(Bray = rep(NA, nrow(xveg)))
} else {
data %>% mutate(Bray = rep(vegdist(xveg, "bray"), nrow(xveg)))
}
}
BResult.ex %>% left_join(
tidy, by = c("Workbook", "Location", "Method")
) %>% group_by(Workbook, Location, Method) %>%
group_modify(~fVeg(.x))
输出
# A tibble: 22 x 9
# Groups: Workbook, Location, Method [12]
Workbook Location Method Score Bray Observer Worm Larvae Swimmer
<fct> <fct> <fct> <int> <dbl> <fct> <int> <int> <int>
1 A BigCreek K 2 0 Vol 4 4 1
2 A BigCreek K 2 0 Bio 4 4 1
3 A BigCreek L 3 0.294 Vol 2 3 2
4 A BigCreek L 3 0.294 Bio 5 2 3
5 A BigCreek V 1 0.158 Vol 5 2 3
6 A BigCreek V 1 0.158 Bio 3 3 3
7 A SmallCreek K 3 0.333 Vol 1 1 2
8 A SmallCreek K 3 0.333 Bio 2 4 2
9 A SmallCreek L 1 0.263 Vol 2 4 3
10 A SmallCreek L 1 0.263 Bio 4 2 4
# ... with 12 more rows
我和我的导师最终使用了这个 for 循环,因为他更熟悉这个循环。它产生了我在这个应用程序和我的真实数据集上所追求的结果。
for(i in 1:length(BCresult.ex$Workbook))
{
workbook.temp.ex <- BCresult.ex[i, "Workbook"]
location.temp.ex <- BCresult.ex[i, "Location"]
method.temp.ex <- BCresult.ex[i, "Method"]
BCdata.temp.ex <- tidy[tidy$Workbook == workbook.temp.ex &
tidy$Location == location.temp.ex &
tidy$Method == method.temp.ex,
5:7]
BCresult.ex[i, "Bray"] <- vegdist(BCdata.temp.ex,'bray')
# test <- as.numeric(vegdist(BCdata.temp,'bray'))
}
结果:
> BCresult.ex
Workbook Location Method Score Bray
1 A BigCreek K 2 0
2 A BigCreek L 3 0.294117647058824
3 A BigCreek V 1 0.157894736842105
4 A SmallCreek K 3 0.333333333333333
5 A SmallCreek L 1 0.263157894736842
6 A SmallCreek V 2 0.3
7 B BigCreek K 1 0.3
8 B BigCreek L 1 0.181818181818182
9 B BigCreek V 1 0.142857142857143
10 B SmallCreek K 1 0.444444444444444
11 B SmallCreek L 3 0.0833333333333333
12 B SmallCreek V 3 0.176470588235294
唯一的问题:关于 vegan 中的“bray”函数 returns Bray-Curtis 值作为字符,而不是数字。我们只是导出为 .csv,然后将该 .csv 重新加载到 R 中以解决这个问题。使用 as.numeric 似乎没有用。
我正在尝试 运行 Bray Curtis 在志愿者和专业生物学家的观察结果之间的差异。我有一个正在处理的数据集“tidy”,以及一个我想将结果添加到的结果数据集“BCresult.ex”。对于“工作簿”、“位置”和“方法”的每个独特的现有组合,我想将“Vol”观察行与“Bio”观察行进行比较。 Workbook/Location/Method 的每个唯一现有组合对应于在站点采集的样本。
我已经使用“过滤器”成功地对一个 Workbook/Location/Method 组合进行了子集化,并比较了“Vol”和“Bio”行之间的观察结果,然后将结果附加到“[=33] 的最后一列=]”。我想弄清楚如何将此功能应用于“整洁”中存在的所有组合。
“整洁”
# A tibble: 24 × 7
Workbook Location Method Observer Worm Larvae Swimmer
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 A BigCreek K Vol 4 4 1
2 A BigCreek K Bio 4 4 1
3 A BigCreek L Vol 2 3 2
4 A BigCreek L Bio 5 2 3
5 A BigCreek V Vol 5 2 3
6 A BigCreek V Bio 3 3 3
7 A SmallCreek K Vol 1 1 2
8 A SmallCreek K Bio 2 4 2
9 A SmallCreek L Vol 2 4 3
10 A SmallCreek L Bio 4 2 4
# … with 14 more rows
BCresult.ex
Workbook Location Method Score Bray
1 A BigCreek K 2 NA
2 A BigCreek L 3 NA
3 A BigCreek V 1 NA
4 A SmallCreek K 3 NA
5 A SmallCreek L 1 NA
6 A SmallCreek V 2 NA
7 B BigCreek K 1 NA
8 B BigCreek L 1 NA
9 B BigCreek V 1 NA
10 B SmallCreek K 1 NA
11 B SmallCreek L 3 NA
12 B SmallCreek V 3 NA
我可以运行这个:
Observation <- filter(tidy, Workbook == "A" &
Location == "BigCreek" &
Method == "K")
BrayC = vegdist(Observation[,5:7], "bray")
BrayC
BCresult.ex %>%
mutate(Bray = BrayC)
制作这个:
Workbook Location Method Score Bray
1 A BigCreek K 2 0
2 A BigCreek L 3 0
3 A BigCreek V 1 0
4 A SmallCreek K 3 0
5 A SmallCreek L 1 0
6 A SmallCreek V 2 0
7 B BigCreek K 1 0
8 B BigCreek L 1 0
9 B BigCreek V 1 0
10 B SmallCreek K 1 0
11 B SmallCreek L 3 0
12 B SmallCreek V 3 0
理想情况下,一旦我弄清楚如何循环,我就会得到: 我不确定如何设置循环。我是否列出所有变量并循环遍历它?或者列出所有现有组合并使用它?真实数据集并没有所有可能的组合。我是否只是指向一个位置(每两行)并这样做?
Workbook Location Method Score Bray
1 A BigCreek K 2 0.00000000
2 A BigCreek L 3 0.29411760
3 A BigCreek V 1 0.15789470
4 A SmallCreek K 3 0.33333330
5 A SmallCreek L 1 0.26315790
6 A SmallCreek V 2 0.30000000
7 B BigCreek K 1 0.30000000
8 B BigCreek L 1 0.18181820
9 B BigCreek V 1 0.14285710
10 B SmallCreek K 1 0.44444440
11 B SmallCreek L 3 0.08333333
12 B SmallCreek V 3 0.17647060
哇哦,你终于做到了!我希望这是清楚的,新的问题提问者和新的 R 用户。非常感谢。
library(tidyverse)
library(vegan)
tidy = read.table(
header = TRUE,text="
Workbook Location Method Observer Worm Larvae Swimmer
A BigCreek K Vol 4 4 1
A BigCreek K Bio 4 4 1
A BigCreek L Vol 2 3 2
A BigCreek L Bio 5 2 3
A BigCreek V Vol 5 2 3
A BigCreek V Bio 3 3 3
A SmallCreek K Vol 1 1 2
A SmallCreek K Bio 2 4 2
A SmallCreek L Vol 2 4 3
A SmallCreek L Bio 4 2 4
B BigCreek K Vol 4 4 1
B BigCreek K Bio 4 4 1
B BigCreek L Vol 2 3 2
B BigCreek L Bio 5 2 3
B BigCreek V Vol 5 2 3
B BigCreek V Bio 3 3 3
B SmallCreek K Vol 1 1 2
B SmallCreek K Bio 2 4 2
B SmallCreek L Vol 2 4 3
B SmallCreek L Bio 4 2 4
") %>%
as_tibble() %>%
mutate(
Workbook = Workbook %>% fct_inorder(),
Location = Location %>% fct_inorder(),
Method = Method %>% fct_inorder(),
Observer = Observer %>% fct_inorder()
)
BResult.ex = read.table(
header = TRUE,text="
Workbook Location Method Score Bray
A BigCreek K 2 NA
A BigCreek L 3 NA
A BigCreek V 1 NA
A SmallCreek K 3 NA
A SmallCreek L 1 NA
A SmallCreek V 2 NA
B BigCreek K 1 NA
B BigCreek L 1 NA
B BigCreek V 1 NA
B SmallCreek K 1 NA
B SmallCreek L 3 NA
B SmallCreek V 3 NA
") %>%
as_tibble() %>%
mutate(
Workbook = Workbook %>% fct_inorder(),
Location = Location %>% fct_inorder(),
Method = Method %>% fct_inorder()
)
fVeg = function(data){
xveg = data %>% select(Worm:Swimmer)
if(any(is.na(xveg$Worm), is.na(xveg$Larvae), is.na(xveg$Swimmer))) {
data %>% mutate(Bray = rep(NA, nrow(xveg)))
} else {
data %>% mutate(Bray = rep(vegdist(xveg, "bray"), nrow(xveg)))
}
}
BResult.ex %>% left_join(
tidy, by = c("Workbook", "Location", "Method")
) %>% group_by(Workbook, Location, Method) %>%
group_modify(~fVeg(.x))
输出
# A tibble: 22 x 9
# Groups: Workbook, Location, Method [12]
Workbook Location Method Score Bray Observer Worm Larvae Swimmer
<fct> <fct> <fct> <int> <dbl> <fct> <int> <int> <int>
1 A BigCreek K 2 0 Vol 4 4 1
2 A BigCreek K 2 0 Bio 4 4 1
3 A BigCreek L 3 0.294 Vol 2 3 2
4 A BigCreek L 3 0.294 Bio 5 2 3
5 A BigCreek V 1 0.158 Vol 5 2 3
6 A BigCreek V 1 0.158 Bio 3 3 3
7 A SmallCreek K 3 0.333 Vol 1 1 2
8 A SmallCreek K 3 0.333 Bio 2 4 2
9 A SmallCreek L 1 0.263 Vol 2 4 3
10 A SmallCreek L 1 0.263 Bio 4 2 4
# ... with 12 more rows
我和我的导师最终使用了这个 for 循环,因为他更熟悉这个循环。它产生了我在这个应用程序和我的真实数据集上所追求的结果。
for(i in 1:length(BCresult.ex$Workbook))
{
workbook.temp.ex <- BCresult.ex[i, "Workbook"]
location.temp.ex <- BCresult.ex[i, "Location"]
method.temp.ex <- BCresult.ex[i, "Method"]
BCdata.temp.ex <- tidy[tidy$Workbook == workbook.temp.ex &
tidy$Location == location.temp.ex &
tidy$Method == method.temp.ex,
5:7]
BCresult.ex[i, "Bray"] <- vegdist(BCdata.temp.ex,'bray')
# test <- as.numeric(vegdist(BCdata.temp,'bray'))
}
结果:
> BCresult.ex
Workbook Location Method Score Bray
1 A BigCreek K 2 0
2 A BigCreek L 3 0.294117647058824
3 A BigCreek V 1 0.157894736842105
4 A SmallCreek K 3 0.333333333333333
5 A SmallCreek L 1 0.263157894736842
6 A SmallCreek V 2 0.3
7 B BigCreek K 1 0.3
8 B BigCreek L 1 0.181818181818182
9 B BigCreek V 1 0.142857142857143
10 B SmallCreek K 1 0.444444444444444
11 B SmallCreek L 3 0.0833333333333333
12 B SmallCreek V 3 0.176470588235294
唯一的问题:关于 vegan 中的“bray”函数 returns Bray-Curtis 值作为字符,而不是数字。我们只是导出为 .csv,然后将该 .csv 重新加载到 R 中以解决这个问题。使用 as.numeric 似乎没有用。