R 编程 - 分配索引以对日期和字符变量的实例进行编号
R programming - assigning an index to number the instances of date and character variable
我有一个包含两个变量的数据 table,日期和团队:
Date <- c("2016-11-20", "2016-11-20", "2016-11-20", "2016-11-21", "2016-11-21", "2016-11-21", "2016-11-22", "2016-11-22", "2016-11-22", "2016-11-22")
Team <- c("NYK", "CLE", "DET", "DET", "ATL", "BRK", "CLE", "DET", "NYK", "TOR")
DT <- data.table(Date, Team)
DT$Date <- as.Date(Date)
数据 table 最终看起来像:
Date Team
1: 2016-11-20 NYK
2: 2016-11-20 CLE
3: 2016-11-20 DET
4: 2016-11-21 DET
5: 2016-11-21 ATL
6: 2016-11-21 BRK
7: 2016-11-22 CLE
8: 2016-11-22 DET
9: 2016-11-22 NYK
10: 2016-11-22 TOR
我想做的是添加一个索引列,说明该团队出现了多少次。它看起来像这样:
Date Team gamenum
1: 2016-11-20 NYK 1
2: 2016-11-20 CLE 1
3: 2016-11-20 DET 1
4: 2016-11-21 DET 2
5: 2016-11-21 ATL 1
6: 2016-11-21 BRK 1
7: 2016-11-22 CLE 2
8: 2016-11-22 DET 3
9: 2016-11-22 NYK 2
10: 2016-11-22 TOR 1
我认为代码看起来像我在其他帖子中找到的代码:
NewDT <- DT[, ':='(Date = .N, gamenum = 1:.N), by = Team]
但它给了我一个错误:
Error in `[.data.table`(DT, , `:=`(Date = .N, gamenum = 1:.N), by = Team) :
Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
我的理解是 类 不匹配,但我不知道如何在不添加额外的、不必要的数据的情况下完成这项工作。提前致谢。
这还不是全部 data.table
但它有效:
library(data.table); library(purrr); library(dplyr); library(magrittr)
DT <- fread("ID Date Team
1: 2016-11-20 NYK
2: 2016-11-20 CLE
3: 2016-11-20 DET
4: 2016-11-21 DET
5: 2016-11-21 ATL
6: 2016-11-21 BRK
7: 2016-11-22 CLE
8: 2016-11-22 DET
9: 2016-11-22 NYK
10: 2016-11-22 TOR")
DT$ID %<>% gsub(":", "", .)
DT %>% split(.$Team) %>%
purrr::map(~ mutate(., game_num = frank(Date))) %>%
bind_rows() %>%
arrange(as.numeric(ID))
ID Date Team game_num
1 1 2016-11-20 NYK 1
2 2 2016-11-20 CLE 1
3 3 2016-11-20 DET 1
4 4 2016-11-21 DET 2
5 5 2016-11-21 ATL 1
6 6 2016-11-21 BRK 1
7 7 2016-11-22 CLE 2
8 8 2016-11-22 DET 3
9 9 2016-11-22 NYK 2
10 10 2016-11-22 TOR 1
如果您对 arrange(Date, Team)
感到满意,您可以取消 df$ID
调整,但顺序不会与您想要的完全相同。
试试这个-
DT$gamenum <- sapply(seq(DT$Team), function(x) sum(DT[1:x,Team] %in% DT[x,Team]))
我真的不认为你想将 .N
分配给 Date
。您可能是指这两个添加的列是序列号和该 Team
的游戏数量:
DT[, ':='(gamenum = 1:.N, no_of_games = .N), by = Team]
给予:
> DT
Date Team gamenum no_of_games
1: 2016-11-20 NYK 1 2
2: 2016-11-20 CLE 1 2
3: 2016-11-20 DET 1 3
4: 2016-11-21 DET 2 3
5: 2016-11-21 ATL 1 1
6: 2016-11-21 BRK 1 1
7: 2016-11-22 CLE 2 2
8: 2016-11-22 DET 3 3
9: 2016-11-22 NYK 2 2
10: 2016-11-22 TOR 1 1
我有一个包含两个变量的数据 table,日期和团队:
Date <- c("2016-11-20", "2016-11-20", "2016-11-20", "2016-11-21", "2016-11-21", "2016-11-21", "2016-11-22", "2016-11-22", "2016-11-22", "2016-11-22")
Team <- c("NYK", "CLE", "DET", "DET", "ATL", "BRK", "CLE", "DET", "NYK", "TOR")
DT <- data.table(Date, Team)
DT$Date <- as.Date(Date)
数据 table 最终看起来像:
Date Team
1: 2016-11-20 NYK
2: 2016-11-20 CLE
3: 2016-11-20 DET
4: 2016-11-21 DET
5: 2016-11-21 ATL
6: 2016-11-21 BRK
7: 2016-11-22 CLE
8: 2016-11-22 DET
9: 2016-11-22 NYK
10: 2016-11-22 TOR
我想做的是添加一个索引列,说明该团队出现了多少次。它看起来像这样:
Date Team gamenum
1: 2016-11-20 NYK 1
2: 2016-11-20 CLE 1
3: 2016-11-20 DET 1
4: 2016-11-21 DET 2
5: 2016-11-21 ATL 1
6: 2016-11-21 BRK 1
7: 2016-11-22 CLE 2
8: 2016-11-22 DET 3
9: 2016-11-22 NYK 2
10: 2016-11-22 TOR 1
我认为代码看起来像我在其他帖子中找到的代码:
NewDT <- DT[, ':='(Date = .N, gamenum = 1:.N), by = Team]
但它给了我一个错误:
Error in `[.data.table`(DT, , `:=`(Date = .N, gamenum = 1:.N), by = Team) :
Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
我的理解是 类 不匹配,但我不知道如何在不添加额外的、不必要的数据的情况下完成这项工作。提前致谢。
这还不是全部 data.table
但它有效:
library(data.table); library(purrr); library(dplyr); library(magrittr)
DT <- fread("ID Date Team
1: 2016-11-20 NYK
2: 2016-11-20 CLE
3: 2016-11-20 DET
4: 2016-11-21 DET
5: 2016-11-21 ATL
6: 2016-11-21 BRK
7: 2016-11-22 CLE
8: 2016-11-22 DET
9: 2016-11-22 NYK
10: 2016-11-22 TOR")
DT$ID %<>% gsub(":", "", .)
DT %>% split(.$Team) %>%
purrr::map(~ mutate(., game_num = frank(Date))) %>%
bind_rows() %>%
arrange(as.numeric(ID))
ID Date Team game_num
1 1 2016-11-20 NYK 1
2 2 2016-11-20 CLE 1
3 3 2016-11-20 DET 1
4 4 2016-11-21 DET 2
5 5 2016-11-21 ATL 1
6 6 2016-11-21 BRK 1
7 7 2016-11-22 CLE 2
8 8 2016-11-22 DET 3
9 9 2016-11-22 NYK 2
10 10 2016-11-22 TOR 1
如果您对 arrange(Date, Team)
感到满意,您可以取消 df$ID
调整,但顺序不会与您想要的完全相同。
试试这个-
DT$gamenum <- sapply(seq(DT$Team), function(x) sum(DT[1:x,Team] %in% DT[x,Team]))
我真的不认为你想将 .N
分配给 Date
。您可能是指这两个添加的列是序列号和该 Team
的游戏数量:
DT[, ':='(gamenum = 1:.N, no_of_games = .N), by = Team]
给予:
> DT
Date Team gamenum no_of_games
1: 2016-11-20 NYK 1 2
2: 2016-11-20 CLE 1 2
3: 2016-11-20 DET 1 3
4: 2016-11-21 DET 2 3
5: 2016-11-21 ATL 1 1
6: 2016-11-21 BRK 1 1
7: 2016-11-22 CLE 2 2
8: 2016-11-22 DET 3 3
9: 2016-11-22 NYK 2 2
10: 2016-11-22 TOR 1 1