检查和审查 R 中组内的先前分组值
Checking and reviewing previous grouped values within groups in R
大家好我希望你们度过了愉快的一周。
我有一个包含 4 个变量的小数据集,一个是 subject
,第二个是 key
,这是一个主题用来登录系统的代码,第三个是 order
,它是将跟踪按时间顺序排列的年份,最后是变量 Period
,它指示密钥是在之前的时间 past
还是当前的月份 current
.
这是数据集:
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-cbind(subjects,keys,period,order)
> head(df)
subjects keys period order
[1,] "James" "IX08-8" "past" "1"
[2,] "James" "IX08-8" "past" "2"
[3,] "James" "IX08-8" "current" "3"
[4,] "Alex" "UX-007" "past" "4"
[5,] "Alex" "HH-011" "current" "5"
[6,] "Mila" "PO_85" "past" "6"
最终我必须能够判断受试者是否使用以前使用过的 key
在 current
期间登录系统,如果 subject
使用新的key
在当前 period
登录系统然后我将值“1”分配给名为 result
的列,如果用户没有使用以前使用的 key
在当前 period
期间登录系统,分配的值应为“0”,否则为“NA”。
我想要的输出如下所示:
subjects keys period order result
[1,] "James" "IX08-8" "past" "1" NA
[2,] "James" "IX08-8" "past" "2" NA
[3,] "James" "IX08-8" "current" "3" "0"
[4,] "Alex" "UX-007" "past" "4" NA
[5,] "Alex" "HH-011" "current" "5" "1"
[6,] "Mila" "PO_85" "past" "6" NA
[7,] "Mila" "PO_85" "past" "7" NA
[8,] "Mila" "PO_85" "past" "8" NA
[9,] "Mila" "PO_85" "past" "9" NA
[10,] "Mila" "PO_85" "past" "10" NA
[11,] "Mila" "PO_85" "past" "11" NA
[12,] "Mila" "PO_85" "current" "12" "0"
[13,] "Mila" "UJ_8" "current" "13" "1"
[14,] "Mark" "785_PO" "current" "14" "1"
例如,在第 3 行中,James 在结果中指定了值 0,因为他在当月使用了以前使用的密钥登录系统,即密钥“IX08-8”,但 Mark 有一个结果列中的值为 1,因为系统只跟踪了一个密钥,而这恰好是他用来登录当前期间的密钥,从技术上讲,这是一个“新密钥”。
我做了什么来解决这个问题?
我可以按 subject
对数据集进行分组,并确保按 order
降序排列,但我只能考虑创建一个键向量 (vector.of.previous.keys
) 每个主题基于 (period="past"
) 然后评估当前键是否是 %in%
vector.of.previous.keys
,但是如果有一种方法可以只检查组内的这个标准它会更多高效的。非常感谢你们的帮助。
假设您的数据存储在 data.frame
df <- data.frame(subjects,keys,period,order)
你可以使用
library(dplyr)
df %>%
group_by(subjects, keys) %>%
mutate(count = row_number()) %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & count == 1 ~ 1,
period == "current" & count >= 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
select(-count)
获得
# A tibble: 14 x 5
subjects keys period order result
<chr> <chr> <chr> <int> <dbl>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1
另一种 dplyr
方法,您的所有条件都在 case_when
语句中编码。
代码
library(dplyr)
df %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & n() == 1 ~ "1",
period == "current" & keys == first(keys) ~ "0",
period == "current" & keys != first(keys) & n() > 1 ~ "1",
period == "past" ~ NA_character_,
TRUE == "past" ~ NA_character_))
# A tibble: 14 × 5
# Groups: subjects [4]
subjects keys period order result
<chr> <chr> <chr> <int> <chr>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1
数据
请注意,我已将您的 cbind()
更改为 data.frame
(与矩阵相比,数据框更易于处理)。
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-data.frame(subjects,keys,period,order)
大家好我希望你们度过了愉快的一周。
我有一个包含 4 个变量的小数据集,一个是 subject
,第二个是 key
,这是一个主题用来登录系统的代码,第三个是 order
,它是将跟踪按时间顺序排列的年份,最后是变量 Period
,它指示密钥是在之前的时间 past
还是当前的月份 current
.
这是数据集:
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-cbind(subjects,keys,period,order)
> head(df)
subjects keys period order
[1,] "James" "IX08-8" "past" "1"
[2,] "James" "IX08-8" "past" "2"
[3,] "James" "IX08-8" "current" "3"
[4,] "Alex" "UX-007" "past" "4"
[5,] "Alex" "HH-011" "current" "5"
[6,] "Mila" "PO_85" "past" "6"
最终我必须能够判断受试者是否使用以前使用过的 key
在 current
期间登录系统,如果 subject
使用新的key
在当前 period
登录系统然后我将值“1”分配给名为 result
的列,如果用户没有使用以前使用的 key
在当前 period
期间登录系统,分配的值应为“0”,否则为“NA”。
我想要的输出如下所示:
subjects keys period order result
[1,] "James" "IX08-8" "past" "1" NA
[2,] "James" "IX08-8" "past" "2" NA
[3,] "James" "IX08-8" "current" "3" "0"
[4,] "Alex" "UX-007" "past" "4" NA
[5,] "Alex" "HH-011" "current" "5" "1"
[6,] "Mila" "PO_85" "past" "6" NA
[7,] "Mila" "PO_85" "past" "7" NA
[8,] "Mila" "PO_85" "past" "8" NA
[9,] "Mila" "PO_85" "past" "9" NA
[10,] "Mila" "PO_85" "past" "10" NA
[11,] "Mila" "PO_85" "past" "11" NA
[12,] "Mila" "PO_85" "current" "12" "0"
[13,] "Mila" "UJ_8" "current" "13" "1"
[14,] "Mark" "785_PO" "current" "14" "1"
例如,在第 3 行中,James 在结果中指定了值 0,因为他在当月使用了以前使用的密钥登录系统,即密钥“IX08-8”,但 Mark 有一个结果列中的值为 1,因为系统只跟踪了一个密钥,而这恰好是他用来登录当前期间的密钥,从技术上讲,这是一个“新密钥”。
我做了什么来解决这个问题?
我可以按 subject
对数据集进行分组,并确保按 order
降序排列,但我只能考虑创建一个键向量 (vector.of.previous.keys
) 每个主题基于 (period="past"
) 然后评估当前键是否是 %in%
vector.of.previous.keys
,但是如果有一种方法可以只检查组内的这个标准它会更多高效的。非常感谢你们的帮助。
假设您的数据存储在 data.frame
df <- data.frame(subjects,keys,period,order)
你可以使用
library(dplyr)
df %>%
group_by(subjects, keys) %>%
mutate(count = row_number()) %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & count == 1 ~ 1,
period == "current" & count >= 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
select(-count)
获得
# A tibble: 14 x 5
subjects keys period order result
<chr> <chr> <chr> <int> <dbl>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1
另一种 dplyr
方法,您的所有条件都在 case_when
语句中编码。
代码
library(dplyr)
df %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & n() == 1 ~ "1",
period == "current" & keys == first(keys) ~ "0",
period == "current" & keys != first(keys) & n() > 1 ~ "1",
period == "past" ~ NA_character_,
TRUE == "past" ~ NA_character_))
# A tibble: 14 × 5
# Groups: subjects [4]
subjects keys period order result
<chr> <chr> <chr> <int> <chr>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1
数据
请注意,我已将您的 cbind()
更改为 data.frame
(与矩阵相比,数据框更易于处理)。
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-data.frame(subjects,keys,period,order)