根据每个组的最近日期查找组的平均值

Finding the mean of a group, based on the most recent date of each group

抱歉,标题不明确。 我的问题很简单,但很难说清楚。 如果我有样本数据集:

Person Date (m/d/y) Weight
Person1 01/15/21 93
Person2 01/16/21 87
Person3 01/14/21 73
Person1 01/17/21 95
Person2 01/15/21 85
Person3 01/18/21 73.5

在 R 中,我如何找到 Person1、2、3 权重的平均值。请记住,我只有他们最近的体重很重要。

因此,正确答案应该是:

平均值 = 85.2

一个选项是在最后一个日期按 slice 分组,然后取 mean

library(dplyr)
df1 %>%
  group_by(Person) %>%
  slice(which.max(as.Date(`Date (m/d/y)`, '%m/%d/%y'))) %>%
  ungroup %>%
  summarise(Weight = mean(Weight, na.rm = TRUE))

-输出

# A tibble: 1 x 1
#  Weight
#   <dbl>
#1   85.2

数据

df1 <- structure(list(Person = c("Person1", "Person2", "Person3", "Person1", 
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21", 
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93, 
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA, 
-6L))

这里有一个data.table选项

setDT(df)[
  ,
  Weight[which.max(as.Date(`Date (m/d/y)`, format = "%m/%d/%y"))],
  Person
][
  ,
  mean(V1)
]

给予

[1] 85.16667

数据

> dput(df)
structure(list(Person = c("Person1", "Person2", "Person3", "Person1",
"Person2", "Person3"), `Date (m/d/y)` = c("01/15/21", "01/16/21",
"01/14/21", "01/17/21", "01/15/21", "01/18/21"), Weight = c(93,
87, 73, 95, 85, 73.5)), class = "data.frame", row.names = c(NA,
-6L))