按多个类别分组时的舍入误差
Rounding error when grouping by multiple categories
为什么 SE_daily
的值是错误的?我希望它四舍五入到最接近的整数(虽然我想要一个小数),但小数答案是完全错误的。我错过了什么?
csv<-csv%>%group_by(id_num)%>%group_by(Month)%>%group_by(Day)%>%mutate(SE_daily=mean(SelfEsteem, na.rm=T))
head(csv[,c(1:5,28,181)])
> head(csv[,c(1:5,28,181)])
Source: local data frame [6 x 7]
Groups: Day [3]
X.1 X id_num Month Day SelfEsteem SE_daily
<int> <int> <int> <int> <int> <int> <dbl>
1 1 1 29 2 19 4 3.457944 #mean(4,4,3)= 4, expected answer= 3.66666666667
2 2 2 29 2 19 4 3.457944
3 3 3 29 2 19 3 3.457944
4 4 4 29 2 20 4 3.424242 #expected answer= 4
5 5 5 29 2 21 4 3.318182 #expected answer=4
6 6 6 29 2 21 4 3.318182
csv 输出的头部:
structure(list(X.1 = 1:6, X = 1:6,
id_num = c(29L, 29L, 29L, 29L, 29L, 29L),
Month = c(2L, 2L, 2L, 2L, 2L, 2L),
Day = c(19L, 19L, 19L, 20L, 21L, 21L),
SelfEsteem = c(4L, 4L, 3L, 4L, 4L, 4L),
SE_daily = c(3.45794392523365, 3.45794392523365, 3.45794392523365, 3.42424242424242, 3.31818181818182, 3.31818181818182)),
.Names = c("X.1", "X", "id_num", "Month", "Day", "SelfEsteem", "SE_daily"),
row.names = c(NA, -6L),
class = "data.frame")
我得到了 SE_daily 的预期输出。有可能通过管道传送 group_by
命令而不是将它们放在单个命令中,您正在查看共享一个公共 Day
的多个 id_num
和 Months
(假设所提供的数据结构只是整个数据集的一个子集)
library(dplyr)
csv %>%
group_by(id_num, Month, Day) %>%
mutate(SE_daily=mean(SelfEsteem, na.rm=TRUE))
输出
Source: local data frame [6 x 7]
Groups: id_num, Month, Day [3]
X.1 X id_num Month Day SelfEsteem SE_daily
<int> <int> <int> <int> <int> <int> <dbl>
1 1 1 29 2 19 4 3.666667
2 2 2 29 2 19 4 3.666667
3 3 3 29 2 19 3 3.666667
4 4 4 29 2 20 4 4.000000
5 5 5 29 2 21 4 4.000000
6 6 6 29 2 21 4 4.000000
为什么 SE_daily
的值是错误的?我希望它四舍五入到最接近的整数(虽然我想要一个小数),但小数答案是完全错误的。我错过了什么?
csv<-csv%>%group_by(id_num)%>%group_by(Month)%>%group_by(Day)%>%mutate(SE_daily=mean(SelfEsteem, na.rm=T))
head(csv[,c(1:5,28,181)])
> head(csv[,c(1:5,28,181)])
Source: local data frame [6 x 7]
Groups: Day [3]
X.1 X id_num Month Day SelfEsteem SE_daily
<int> <int> <int> <int> <int> <int> <dbl>
1 1 1 29 2 19 4 3.457944 #mean(4,4,3)= 4, expected answer= 3.66666666667
2 2 2 29 2 19 4 3.457944
3 3 3 29 2 19 3 3.457944
4 4 4 29 2 20 4 3.424242 #expected answer= 4
5 5 5 29 2 21 4 3.318182 #expected answer=4
6 6 6 29 2 21 4 3.318182
csv 输出的头部:
structure(list(X.1 = 1:6, X = 1:6,
id_num = c(29L, 29L, 29L, 29L, 29L, 29L),
Month = c(2L, 2L, 2L, 2L, 2L, 2L),
Day = c(19L, 19L, 19L, 20L, 21L, 21L),
SelfEsteem = c(4L, 4L, 3L, 4L, 4L, 4L),
SE_daily = c(3.45794392523365, 3.45794392523365, 3.45794392523365, 3.42424242424242, 3.31818181818182, 3.31818181818182)),
.Names = c("X.1", "X", "id_num", "Month", "Day", "SelfEsteem", "SE_daily"),
row.names = c(NA, -6L),
class = "data.frame")
我得到了 SE_daily 的预期输出。有可能通过管道传送 group_by
命令而不是将它们放在单个命令中,您正在查看共享一个公共 Day
的多个 id_num
和 Months
(假设所提供的数据结构只是整个数据集的一个子集)
library(dplyr)
csv %>%
group_by(id_num, Month, Day) %>%
mutate(SE_daily=mean(SelfEsteem, na.rm=TRUE))
输出
Source: local data frame [6 x 7]
Groups: id_num, Month, Day [3]
X.1 X id_num Month Day SelfEsteem SE_daily
<int> <int> <int> <int> <int> <int> <dbl>
1 1 1 29 2 19 4 3.666667
2 2 2 29 2 19 4 3.666667
3 3 3 29 2 19 3 3.666667
4 4 4 29 2 20 4 4.000000
5 5 5 29 2 21 4 4.000000
6 6 6 29 2 21 4 4.000000