检测 R 中两个变量之间的首次出现
Detect first occurrences between two variables in R
我想计算 R 中两个变量(IPC 和 2IPC)的首次出现次数,排除两个变量相同的情况(例如 !IPC == 2IPC)。
这是一个数据集示例:
**date IPC 2IPC occurrence**
1968 G01S Na 1
1969 G01N G01S 1
1969 B62D B43L 1
1969 G01S Na 0
1970 G01S G01C 1
1970 G01S H04B 1
1970 G01S H04B 0
1971 G01S H01S 1
1971 G01S G01S 0
1972 H04N H04N 0
1972 G01S G01S 0
1972 G01S G01S 0
我使用了 Excel 函数 COUNTIFS,它为两个变量之间的第一次出现创建了一个虚拟(出现)。是否可以使用 dplyr 来完成这项任务?
使用 dplyr
并假设 Na
值是有效值而不是 NA,您可以 运行 以下代码:
library(dplyr)
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
))
您将得到以下结果:
X..date IPC X2IPC occurrence.. N_occurences FirstOccurrence
<int> <chr> <chr> <int> <int> <dbl>
1 1968 G01S Na 1 1 1.00
2 1969 G01N G01S 1 1 1.00
3 1969 B62D B43L 1 1 1.00
4 1969 G01S Na 0 2 0
5 1970 G01S G01C 1 1 1.00
6 1970 G01S H04B 1 1 1.00
7 1970 G01S H04B 0 2 0
8 1971 G01S H01S 1 1 1.00
9 1971 G01S G01S 0 1 0
10 1972 H04N H04N 0 1 0
11 1972 G01S G01S 0 2 0
12 1972 G01S G01S 0 3 0
是否要在OP中使用相同的数据框,只需运行代码:
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
)) %>%
select(1:3,6)
使用
transform(dat,occurence=as.numeric(!duplicated(dat[2:3])&(IPC!=X2IPC)))
date IPC X2IPC occurence
1 1968 G01S Na 1
2 1969 G01N G01S 1
3 1969 B62D B43L 1
4 1969 G01S Na 0
5 1970 G01S G01C 1
6 1970 G01S H04B 1
7 1970 G01S H04B 0
8 1971 G01S H01S 1
9 1971 G01S G01S 0
10 1972 H04N H04N 0
11 1972 G01S G01S 0
12 1972 G01S G01S 0
数据:
dat=structure(list(date = c(1968L, 1969L, 1969L, 1969L, 1970L, 1970L,
1970L, 1971L, 1971L, 1972L, 1972L, 1972L), IPC = c("G01S", "G01N",
"B62D", "G01S", "G01S", "G01S", "G01S", "G01S", "G01S", "H04N",
"G01S", "G01S"), X2IPC = c("Na", "G01S", "B43L", "Na", "G01C",
"H04B", "H04B", "H01S", "G01S", "H04N", "G01S", "G01S")), .Names = c("date",
"IPC", "X2IPC"), row.names = c(NA, -12L), class = "data.frame")
我想计算 R 中两个变量(IPC 和 2IPC)的首次出现次数,排除两个变量相同的情况(例如 !IPC == 2IPC)。
这是一个数据集示例:
**date IPC 2IPC occurrence**
1968 G01S Na 1
1969 G01N G01S 1
1969 B62D B43L 1
1969 G01S Na 0
1970 G01S G01C 1
1970 G01S H04B 1
1970 G01S H04B 0
1971 G01S H01S 1
1971 G01S G01S 0
1972 H04N H04N 0
1972 G01S G01S 0
1972 G01S G01S 0
我使用了 Excel 函数 COUNTIFS,它为两个变量之间的第一次出现创建了一个虚拟(出现)。是否可以使用 dplyr 来完成这项任务?
使用 dplyr
并假设 Na
值是有效值而不是 NA,您可以 运行 以下代码:
library(dplyr)
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
))
您将得到以下结果:
X..date IPC X2IPC occurrence.. N_occurences FirstOccurrence
<int> <chr> <chr> <int> <int> <dbl>
1 1968 G01S Na 1 1 1.00
2 1969 G01N G01S 1 1 1.00
3 1969 B62D B43L 1 1 1.00
4 1969 G01S Na 0 2 0
5 1970 G01S G01C 1 1 1.00
6 1970 G01S H04B 1 1 1.00
7 1970 G01S H04B 0 2 0
8 1971 G01S H01S 1 1 1.00
9 1971 G01S G01S 0 1 0
10 1972 H04N H04N 0 1 0
11 1972 G01S G01S 0 2 0
12 1972 G01S G01S 0 3 0
是否要在OP中使用相同的数据框,只需运行代码:
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
)) %>%
select(1:3,6)
使用
transform(dat,occurence=as.numeric(!duplicated(dat[2:3])&(IPC!=X2IPC)))
date IPC X2IPC occurence
1 1968 G01S Na 1
2 1969 G01N G01S 1
3 1969 B62D B43L 1
4 1969 G01S Na 0
5 1970 G01S G01C 1
6 1970 G01S H04B 1
7 1970 G01S H04B 0
8 1971 G01S H01S 1
9 1971 G01S G01S 0
10 1972 H04N H04N 0
11 1972 G01S G01S 0
12 1972 G01S G01S 0
数据:
dat=structure(list(date = c(1968L, 1969L, 1969L, 1969L, 1970L, 1970L,
1970L, 1971L, 1971L, 1972L, 1972L, 1972L), IPC = c("G01S", "G01N",
"B62D", "G01S", "G01S", "G01S", "G01S", "G01S", "G01S", "H04N",
"G01S", "G01S"), X2IPC = c("Na", "G01S", "B43L", "Na", "G01C",
"H04B", "H04B", "H01S", "G01S", "H04N", "G01S", "G01S")), .Names = c("date",
"IPC", "X2IPC"), row.names = c(NA, -12L), class = "data.frame")