dplyr 中的 rowMeans 函数
rowMeans function in dplyr
我一直在尝试 运行 在 dplyr
的 mutate
函数中计算 rowMeans
,但不断出现错误。下面是一个示例数据集和所需的结果。
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000))
我编写的代码从随机采样 STUFF
和 STUFF2
开始。然后我想计算 STUFF
和 STUFF2
的 rowMeans
并将结果导出到新列。我可以使用 tidyr
完成此任务,但必须重做更多的变量。此外,我可以使用 R 基础包,但更喜欢使用 dplyr
中的 mutate
函数找到解决方案。提前致谢。
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
# These approaches return errors
mutate(NAYSA = rowMeans(DATA[,-1:-2]))
mutate(NAYSA = rowMeans(.[,-1:-2]))
mutate (NAYSE = rowMeans(.))
您需要 dplyr
中的 rowwise
函数来执行此操作。您的数据是随机的(由于样本),因此它会产生不同的结果,但您会发现它有效:
library(dplyr)
group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
rowwise() %>%
mutate(NAYSA = mean(c(STUFF,STUFF2)))
输出:
Source: local data frame [9 x 5]
Groups: <by row>
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 2 2.0
3 A 2 30 80 55.0
4 A 2 30 60 45.0
5 B 3 200 600 400.0
6 B 3 300 200 250.0
7 B 3 100 600 350.0
8 C 4 5000 12000 8500.0
9 C 4 6000 10000 8000.0
如您所见,它根据 STUFF 和 STUFF2
计算每行的行向平均值
@GregF 是的....ungroup()
是关键。谢谢。
工作代码
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF = sample(STUFF,replace= TRUE),
STUFF2 = sample(STUFF2,replace= TRUE)) %>%
ungroup() %>%
mutate(NAYSA = rowMeans(.[,-1:-2]))
rowMeans 函数至少需要两个维度
但是 DATA[,-1:-3]
只是一行。
[1] 2 4 60 80 200 400 600 10000 12000
您可以通过以下代码获取结果
DATA%>%
group_by(SITE, DATE) %>%
ungroup() %>%
mutate(NAYSA = rowMeans(.[,3:4]))
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 4 3.0
3 A 2 30 60 45.0
4 A 2 40 80 60.0
5 B 3 100 200 150.0
6 B 3 200 400 300.0
7 B 3 300 600 450.0
8 C 4 5000 10000 7500.0
9 C 4 6000 12000 9000.0
另一种(最好的?)方法是使用 map2_dbl
:
library(purrr)
library(dplyr)
DATA %>%
mutate(NAYSA = map2_dbl(STUFF, STUFF2, ~mean(c(.x, .y))))
输出:
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 4 3.0
3 A 2 30 60 45.0
4 A 2 40 80 60.0
5 B 3 100 200 150.0
6 B 3 200 400 300.0
7 B 3 300 600 450.0
8 C 4 5000 10000 7500.0
9 C 4 6000 12000 9000.0
现在 dplyr 引入了 across
,这可以用 across
和基础 R 的 rowMeans
来完成。以下代码将取 row-wise 以字符串“STUFF”开头的列的平均值:
DATA %>%
mutate(NAYSA = rowMeans(across(starts_with("STUFF"))))
我一直在尝试 运行 在 dplyr
的 mutate
函数中计算 rowMeans
,但不断出现错误。下面是一个示例数据集和所需的结果。
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000))
我编写的代码从随机采样 STUFF
和 STUFF2
开始。然后我想计算 STUFF
和 STUFF2
的 rowMeans
并将结果导出到新列。我可以使用 tidyr
完成此任务,但必须重做更多的变量。此外,我可以使用 R 基础包,但更喜欢使用 dplyr
中的 mutate
函数找到解决方案。提前致谢。
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
# These approaches return errors
mutate(NAYSA = rowMeans(DATA[,-1:-2]))
mutate(NAYSA = rowMeans(.[,-1:-2]))
mutate (NAYSE = rowMeans(.))
您需要 dplyr
中的 rowwise
函数来执行此操作。您的数据是随机的(由于样本),因此它会产生不同的结果,但您会发现它有效:
library(dplyr)
group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
rowwise() %>%
mutate(NAYSA = mean(c(STUFF,STUFF2)))
输出:
Source: local data frame [9 x 5]
Groups: <by row>
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 2 2.0
3 A 2 30 80 55.0
4 A 2 30 60 45.0
5 B 3 200 600 400.0
6 B 3 300 200 250.0
7 B 3 100 600 350.0
8 C 4 5000 12000 8500.0
9 C 4 6000 10000 8000.0
如您所见,它根据 STUFF 和 STUFF2
计算每行的行向平均值@GregF 是的....ungroup()
是关键。谢谢。
工作代码
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF = sample(STUFF,replace= TRUE),
STUFF2 = sample(STUFF2,replace= TRUE)) %>%
ungroup() %>%
mutate(NAYSA = rowMeans(.[,-1:-2]))
rowMeans 函数至少需要两个维度
但是 DATA[,-1:-3]
只是一行。
[1] 2 4 60 80 200 400 600 10000 12000
您可以通过以下代码获取结果
DATA%>%
group_by(SITE, DATE) %>%
ungroup() %>%
mutate(NAYSA = rowMeans(.[,3:4]))
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 4 3.0
3 A 2 30 60 45.0
4 A 2 40 80 60.0
5 B 3 100 200 150.0
6 B 3 200 400 300.0
7 B 3 300 600 450.0
8 C 4 5000 10000 7500.0
9 C 4 6000 12000 9000.0
另一种(最好的?)方法是使用 map2_dbl
:
library(purrr)
library(dplyr)
DATA %>%
mutate(NAYSA = map2_dbl(STUFF, STUFF2, ~mean(c(.x, .y))))
输出:
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 4 3.0
3 A 2 30 60 45.0
4 A 2 40 80 60.0
5 B 3 100 200 150.0
6 B 3 200 400 300.0
7 B 3 300 600 450.0
8 C 4 5000 10000 7500.0
9 C 4 6000 12000 9000.0
现在 dplyr 引入了 across
,这可以用 across
和基础 R 的 rowMeans
来完成。以下代码将取 row-wise 以字符串“STUFF”开头的列的平均值:
DATA %>%
mutate(NAYSA = rowMeans(across(starts_with("STUFF"))))