根据每个组的最近日期查找组的平均值
Finding the mean of a group, based on the most recent date of each group
抱歉,标题不明确。
我的问题很简单,但很难说清楚。
如果我有样本数据集:
Person
Date (m/d/y)
Weight
Person1
01/15/21
93
Person2
01/16/21
87
Person3
01/14/21
73
Person1
01/17/21
95
Person2
01/15/21
85
Person3
01/18/21
73.5
在 R 中,我如何找到 Person1、2、3 权重的平均值。请记住,我只有他们最近的体重很重要。
因此,正确答案应该是:
- 人 1 (01/17/21) 体重 = 95;
- Person2 (01/16/21) 体重 = 87;
- Person3 (01/18/21) 体重 = 73.5;
平均值 = 85.2
一个选项是在最后一个日期按 slice
分组,然后取 mean
library(dplyr)
df1 %>%
group_by(Person) %>%
slice(which.max(as.Date(`Date (m/d/y)`, '%m/%d/%y'))) %>%
ungroup %>%
summarise(Weight = mean(Weight, na.rm = TRUE))
-输出
# A tibble: 1 x 1
# Weight
# <dbl>
#1 85.2
数据
df1 <- structure(list(Person = c("Person1", "Person2", "Person3", "Person1",
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21",
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93,
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA,
-6L))
这里有一个data.table
选项
setDT(df)[
,
Weight[which.max(as.Date(`Date (m/d/y)`, format = "%m/%d/%y"))],
Person
][
,
mean(V1)
]
给予
[1] 85.16667
数据
> dput(df)
structure(list(Person = c("Person1", "Person2", "Person3", "Person1",
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21",
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93,
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA,
-6L))
抱歉,标题不明确。 我的问题很简单,但很难说清楚。 如果我有样本数据集:
Person | Date (m/d/y) | Weight |
---|---|---|
Person1 | 01/15/21 | 93 |
Person2 | 01/16/21 | 87 |
Person3 | 01/14/21 | 73 |
Person1 | 01/17/21 | 95 |
Person2 | 01/15/21 | 85 |
Person3 | 01/18/21 | 73.5 |
在 R 中,我如何找到 Person1、2、3 权重的平均值。请记住,我只有他们最近的体重很重要。
因此,正确答案应该是:
- 人 1 (01/17/21) 体重 = 95;
- Person2 (01/16/21) 体重 = 87;
- Person3 (01/18/21) 体重 = 73.5;
平均值 = 85.2
一个选项是在最后一个日期按 slice
分组,然后取 mean
library(dplyr)
df1 %>%
group_by(Person) %>%
slice(which.max(as.Date(`Date (m/d/y)`, '%m/%d/%y'))) %>%
ungroup %>%
summarise(Weight = mean(Weight, na.rm = TRUE))
-输出
# A tibble: 1 x 1
# Weight
# <dbl>
#1 85.2
数据
df1 <- structure(list(Person = c("Person1", "Person2", "Person3", "Person1",
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21",
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93,
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA,
-6L))
这里有一个data.table
选项
setDT(df)[
,
Weight[which.max(as.Date(`Date (m/d/y)`, format = "%m/%d/%y"))],
Person
][
,
mean(V1)
]
给予
[1] 85.16667
数据
> dput(df)
structure(list(Person = c("Person1", "Person2", "Person3", "Person1",
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21",
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93,
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA,
-6L))