R 编程 - 分配索引以对日期和字符变量的实例进行编号

Question

我有一个包含两个变量的数据 table，日期和团队：

Date <- c("2016-11-20", "2016-11-20", "2016-11-20", "2016-11-21", "2016-11-21", "2016-11-21", "2016-11-22", "2016-11-22", "2016-11-22", "2016-11-22")
Team <- c("NYK", "CLE", "DET", "DET", "ATL", "BRK", "CLE", "DET", "NYK", "TOR")
DT <- data.table(Date, Team)
DT$Date <- as.Date(Date)

数据 table 最终看起来像：

    Date       Team
 1: 2016-11-20  NYK
 2: 2016-11-20  CLE
 3: 2016-11-20  DET
 4: 2016-11-21  DET
 5: 2016-11-21  ATL
 6: 2016-11-21  BRK
 7: 2016-11-22  CLE
 8: 2016-11-22  DET
 9: 2016-11-22  NYK
10: 2016-11-22  TOR

我想做的是添加一个索引列，说明该团队出现了多少次。它看起来像这样：

          Date Team  gamenum
 1: 2016-11-20  NYK     1
 2: 2016-11-20  CLE     1
 3: 2016-11-20  DET     1
 4: 2016-11-21  DET     2
 5: 2016-11-21  ATL     1
 6: 2016-11-21  BRK     1
 7: 2016-11-22  CLE     2
 8: 2016-11-22  DET     3
 9: 2016-11-22  NYK     2
10: 2016-11-22  TOR     1

我认为代码看起来像我在其他帖子中找到的代码：

NewDT <- DT[, ':='(Date = .N, gamenum = 1:.N), by = Team]

但它给了我一个错误：

Error in `[.data.table`(DT, , `:=`(Date = .N, gamenum = 1:.N), by = Team) : 
  Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

我的理解是类不匹配，但我不知道如何在不添加额外的、不必要的数据的情况下完成这项工作。提前致谢。

Answer 1

这还不是全部 data.table 但它有效：

library(data.table); library(purrr); library(dplyr); library(magrittr)
DT <- fread("ID    Date       Team
 1: 2016-11-20  NYK
            2: 2016-11-20  CLE
            3: 2016-11-20  DET
            4: 2016-11-21  DET
            5: 2016-11-21  ATL
            6: 2016-11-21  BRK
            7: 2016-11-22  CLE
            8: 2016-11-22  DET
            9: 2016-11-22  NYK
            10: 2016-11-22  TOR")
DT$ID %<>% gsub(":", "", .)

DT %>% split(.$Team) %>% 
    purrr::map(~ mutate(., game_num = frank(Date))) %>%
    bind_rows() %>%
    arrange(as.numeric(ID))

   ID       Date Team game_num
1   1 2016-11-20  NYK        1
2   2 2016-11-20  CLE        1
3   3 2016-11-20  DET        1
4   4 2016-11-21  DET        2
5   5 2016-11-21  ATL        1
6   6 2016-11-21  BRK        1
7   7 2016-11-22  CLE        2
8   8 2016-11-22  DET        3
9   9 2016-11-22  NYK        2
10 10 2016-11-22  TOR        1

如果您对 arrange(Date, Team) 感到满意，您可以取消 df$ID 调整，但顺序不会与您想要的完全相同。

Answer 2

试试这个-

DT$gamenum <- sapply(seq(DT$Team), function(x) sum(DT[1:x,Team] %in% DT[x,Team]))

Answer 3

我真的不认为你想将 .N 分配给 Date。您可能是指这两个添加的列是序列号和该 Team 的游戏数量：

DT[, ':='(gamenum = 1:.N, no_of_games = .N), by = Team]

给予：

> DT
          Date Team gamenum no_of_games
 1: 2016-11-20  NYK       1           2
 2: 2016-11-20  CLE       1           2
 3: 2016-11-20  DET       1           3
 4: 2016-11-21  DET       2           3
 5: 2016-11-21  ATL       1           1
 6: 2016-11-21  BRK       1           1
 7: 2016-11-22  CLE       2           2
 8: 2016-11-22  DET       3           3
 9: 2016-11-22  NYK       2           2
10: 2016-11-22  TOR       1           1

R 编程 - 分配索引以对日期和字符变量的实例进行编号

R programming - assigning an index to number the instances of date and character variable

counter

r

date