如何 select 'x' R 中每组的最新值?
How to select 'x' most recent values in each group in R?
我正在尝试 select/filter R 数据框中每个组中的最新值。例如,我想 select 3 个最近的值(即最接近的日期到今天)来自以下数据框中的每个名称组:
Player Date Result
Sam 03/15/2015 1
Sam 03/22/2015 0
Sam 04/04/2015 2
Sam 04/12/2015 1
Sam 04/18/2015 1
Sam 04/26/2015 0
Sam 08/08/2015 3
Steve 02/17/2015 0
Steve 02/21/2015 0
Steve 03/04/2015 4
Steve 03/11/2015 2
Steve 03/15/2015 1
Steve 03/22/2015 0
Steve 04/12/2015 0
Steve 04/18/2015 2
Steve 04/26/2015 1
Steve 04/29/2015 2
Steve 08/16/2015 4
Jasper 03/15/2015 3
Jasper 03/22/2015 3.5
Jasper 04/04/2015 4
Jasper 04/12/2015 4
Jasper 04/18/2015 5
Jasper 04/26/2015 0
我已经编写了 as.date()
代码,因此 R 现在可以理解日期格式,但是我现在可以使用什么代码来仅 select 每组中的 3 个(比方说)最新值?
我们可以使用dplyr
。我们使用 as.Date
将 'Date' 转换为 Date
class。按 'Player' 分组后,我们 arrange
'Date' 列降序排列,并使用 slice
获取最近的 3 个值。如果我们不想更改 'Date' class,我们可以删除 mutate
步骤并在 arrange
内进行转换,即 arrange(desc(as.Date(Date, '%m/%d/%Y')))
library(dplyr)
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
arrange(desc(Date)) %>%
slice(1:3)
# Player Date Result
#1 Jasper 2015-04-26 0
#2 Jasper 2015-04-18 5
#3 Jasper 2015-04-12 4
#4 Sam 2015-08-08 3
#5 Sam 2015-04-26 0
#6 Sam 2015-04-18 1
#7 Steve 2015-08-16 4
#8 Steve 2015-04-29 2
#9 Steve 2015-04-26 1
或者按 'Player' 分组后,我们可以通过指定 'n' 和 'wt' 变量进行排序来使用 top_n
。
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
top_n(n = 3, Date)
# Player Date Result
#1 Sam 2015-04-18 1
#2 Sam 2015-04-26 0
#3 Sam 2015-08-08 3
#4 Steve 2015-04-26 1
#5 Steve 2015-04-29 2
#6 Steve 2015-08-16 4
#7 Jasper 2015-04-12 4
#8 Jasper 2015-04-18 5
#9 Jasper 2015-04-26 0
使用 data.table
,我们将 'data.frame' 转换为 'data.table' (setDT(df1)
)。通过'Player分组,我们将order
转换为Date
class后的'Date',利用head
可以得到每组的前3行.
library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
# Player Date Result
#1: Steve 08/16/2015 4
#2: Steve 04/29/2015 2
#3: Steve 04/26/2015 1
#4: Sam 08/08/2015 3
#5: Sam 04/26/2015 0
#6: Sam 04/18/2015 1
#7: Jasper 04/26/2015 0
#8: Jasper 04/18/2015 5
#9: Jasper 04/12/2015 4
数据
df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam",
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve",
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper",
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015",
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015",
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015",
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2,
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5,
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA, -24L))
我正在尝试 select/filter R 数据框中每个组中的最新值。例如,我想 select 3 个最近的值(即最接近的日期到今天)来自以下数据框中的每个名称组:
Player Date Result
Sam 03/15/2015 1
Sam 03/22/2015 0
Sam 04/04/2015 2
Sam 04/12/2015 1
Sam 04/18/2015 1
Sam 04/26/2015 0
Sam 08/08/2015 3
Steve 02/17/2015 0
Steve 02/21/2015 0
Steve 03/04/2015 4
Steve 03/11/2015 2
Steve 03/15/2015 1
Steve 03/22/2015 0
Steve 04/12/2015 0
Steve 04/18/2015 2
Steve 04/26/2015 1
Steve 04/29/2015 2
Steve 08/16/2015 4
Jasper 03/15/2015 3
Jasper 03/22/2015 3.5
Jasper 04/04/2015 4
Jasper 04/12/2015 4
Jasper 04/18/2015 5
Jasper 04/26/2015 0
我已经编写了 as.date()
代码,因此 R 现在可以理解日期格式,但是我现在可以使用什么代码来仅 select 每组中的 3 个(比方说)最新值?
我们可以使用dplyr
。我们使用 as.Date
将 'Date' 转换为 Date
class。按 'Player' 分组后,我们 arrange
'Date' 列降序排列,并使用 slice
获取最近的 3 个值。如果我们不想更改 'Date' class,我们可以删除 mutate
步骤并在 arrange
内进行转换,即 arrange(desc(as.Date(Date, '%m/%d/%Y')))
library(dplyr)
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
arrange(desc(Date)) %>%
slice(1:3)
# Player Date Result
#1 Jasper 2015-04-26 0
#2 Jasper 2015-04-18 5
#3 Jasper 2015-04-12 4
#4 Sam 2015-08-08 3
#5 Sam 2015-04-26 0
#6 Sam 2015-04-18 1
#7 Steve 2015-08-16 4
#8 Steve 2015-04-29 2
#9 Steve 2015-04-26 1
或者按 'Player' 分组后,我们可以通过指定 'n' 和 'wt' 变量进行排序来使用 top_n
。
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
top_n(n = 3, Date)
# Player Date Result
#1 Sam 2015-04-18 1
#2 Sam 2015-04-26 0
#3 Sam 2015-08-08 3
#4 Steve 2015-04-26 1
#5 Steve 2015-04-29 2
#6 Steve 2015-08-16 4
#7 Jasper 2015-04-12 4
#8 Jasper 2015-04-18 5
#9 Jasper 2015-04-26 0
使用 data.table
,我们将 'data.frame' 转换为 'data.table' (setDT(df1)
)。通过'Player分组,我们将order
转换为Date
class后的'Date',利用head
可以得到每组的前3行.
library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
# Player Date Result
#1: Steve 08/16/2015 4
#2: Steve 04/29/2015 2
#3: Steve 04/26/2015 1
#4: Sam 08/08/2015 3
#5: Sam 04/26/2015 0
#6: Sam 04/18/2015 1
#7: Jasper 04/26/2015 0
#8: Jasper 04/18/2015 5
#9: Jasper 04/12/2015 4
数据
df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam",
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve",
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper",
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015",
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015",
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015",
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2,
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5,
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA, -24L))