将日期的第一个日历出现设置为数字“1”,然后逐日移动

Set first calendrical ocurrence of a date as numeric "1" and then move on day by day

我的数据集如下所示:

game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), dateday = c("2015-04-08","2015-05-08","2015-05-10","2015-06-28","2015-09-01","2015-09-02","2015-09-03","2015-10-11"), points = c(20,80,140,230,40,60,98,102))

game_data
  player    dateday points
1      1 2015-04-08     20
2      1 2015-05-08     80
3      1 2015-05-10    140
4      1 2015-06-28    230
5      2 2015-09-01     40
6      2 2015-09-02     60
7      2 2015-09-03     98
8      2 2015-10-11    102

我想要一个数据集,该数据集对每个日期的每个用户都有一个观察值,从该用户的第一个日期条目开始,并将其称为“1”,然后逐日计数。

它应该是这样的(希望我算对了...)

game_data_new <- data.frame(player = c(1,1,1,1,2,2,2,2), dateday = c(1,2,4,53,1,2,3,41), points = c(20,80,140,230,40,60,98,102))

game_data_new
  player dateday points
1      1       1     20
2      1       2     80
3      1       4    140
4      1      53    230
5      2       1     40
6      2       2     60
7      2       3     98
8      2      41    102

使用 dplyr 包非常简单。将 dateday 转换为 Date 对象,该对象支持减去两个日期以获取以天为单位的时差,然后为每个玩家获取从第 0 天开始的天差并加 1.

library(dplyr)
game_data_new <- game_data %>% 
  mutate(dateday = as.Date(dateday)) %>% 
  group_by(player) %>% 
  mutate(dateday = 1 + as.numeric(dateday - min(dateday)))

基本解决方案:

game_data$dateday <- 1 + as.numeric(ave(game_data$dateday, game_data$player, FUN = function(days)c(0, diff(as.Date(days, format = "%Y-%m-%d")))))
#[1]  1 31  3 50  1  2  2 39

数据:stringsAsFactors

game_data <- data.frame(
    player = c(1,1,1,1,2,2,2,2),
    dateday = c("2015-04-08","2015-05-08","2015-05-10","2015-06-28","2015-09-01","2015-09-02","2015-09-03","2015-10-11"),
    points = c(20,80,140,230,40,60,98,102),
    stringsAsFactors = FALSE)