根据第二列中下一个值的索引为列分配编号
Assign Number to Column Based on Index of Next Value in 2nd Column
我有以下数据框:
df <- data.frame(t = c("h","h","h","a","a","h","a","a","h","a","h","a","a"), time = c(1,1,1,1,1,1,1,1,1,1,1,2,2),
key = c("no", "no", "no","yes","no","no","no","no","yes","yes","no","no","no"),
expected = c(-1,-1,-1,1,-1,1,-1,-1,1,1,0,0,0))
t time key expected myTest1 myTest2 myTest3
1 h 1 no -1 -1 1 1
2 h 1 no -1 -1 1 1
3 h 1 no -1 -1 1 1
4 a 1 yes 1 1 1 -1
5 a 1 no -1 1 0 -1
6 h 1 no 1 -1 0 1
7 a 1 no -1 1 0 -1
8 a 1 no -1 1 0 -1
9 h 1 yes 1 -1 0 1
10 a 1 yes 1 1 1 -1
11 h 1 no 0 -1 0 1
12 a 2 no 0 1 0 -1
13 a 2 no 0 1 0 -1
我正在尝试重新创建类似于 expected
的列。按 time
列分组,第一个条件是在 key
中有“是”的每一行中分配 1
。其他条件是:
- 如果
key
中包含下一个“yes”的行也包含 t
中的“h”,分配 1
直到每一行的“yes”行具有“h”,对于具有“a”的行 -1
- 如果
key
中包含下一个“yes”的行也包含 t
中的“a”,分配 1
直到每一行的“yes”行对于具有“h”的行,具有“a”和 -1
- 如果每个
time
部分中没有更多的“是”行,则将 0
分配给该行
我首先尝试使用嵌套 for 循环:
df$myTest1 <- 0
testIdx <- which(df$key %in% "yes")
df$myTest1[testIdx] <- 1
for (i in 1:length(testIdx)) {
for (j in 1:nrow(df)) {
df$myTest1[j] <- ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "h", 1,
ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "a", -1,
ifelse(df$t[testIdx[i]] == "a" & df$t[j] == "h",
-1, ifelse(df$t[testIdx[i]] == "a" &
df$t[j] == "a", 1, 0))))
}
}
这会在 myTest1
中获取正确的值,直到并包括第一个“是”,但在不正确之后获取所有行。
我还尝试了另外两种方法来创建 myTest2
和 myTest3
:
df$myTest2 <- cumsum(c(1, head(df$key == "yes", -1))) %% 2
df <- df %>%
mutate(myTest3 = case_when(t == "h" ~ 1, #add if next "yes" is also "h" condition
t == "a" ~ -1,
TRUE ~ 0))
使用 case_when()
与 ifelse
类似,但我不知道如何在没有 for 循环的情况下添加其他条件。
为澄清起见,expected
列按原样读取,因为第一个“是”属于带有“a”的行,因此所有前面的“h”行得到 -1
而所有先前的“a”行得到 1
。下一个“是”行现在有“h”,所以“是”之间的行得到 1
表示“h”,-1
表示“a”。第 10 行包含一个“yes”并且紧跟在“yes”之后,所以它只是得到一个 1
。第 11 行是最后一个 time
= 1,后面没有“yes”,所以它被赋值为 0
。当 time
= 2 时没有“是”行,因此那里的所有行也收到 0
.
这可能对你有帮助。
魔法发生在 zoo
包中的 na.locf
函数中。
library(magrittr)
library(zoo)
doblock <- function(timeblock) {
yesrows <- which(timeblock$key == "yes")
if (length(yesrows) == 0) {
# no yes rows in timeblock: make all 0
timeblock$exp2 <- 0
} else {
# create a vector of a's and h's against which we need to match the t field
tomatch <- rep(NA, nrow(timeblock))
tomatch[yesrows] <- as.character(timeblock$t)[yesrows]
tomatch <- zoo::na.locf(tomatch, fromLast = TRUE)
# now do the matching
timeblock$exp2 <- 0 # set default as 0 (for those entries after the last 'yes')
timeblock$exp2[1:length(tomatch)] <-
mapply(function(t1, t2) {
if ((t1) == t2) 1 else -1
}, as.character(timeblock$t[1:length(tomatch)]), tomatch)
}
timeblock
}
# split dataframe into blocks for each 'time' and apply function to every time-block
newdf <-
lapply(split(df, df$time), doblock) %>%
do.call(rbind, .)
结果如下所示,其中 exp2
是上述函数的输出。与您的 expected
字段匹配。
t time key expected exp2
1.1 h 1 no -1 -1
1.2 h 1 no -1 -1
1.3 h 1 no -1 -1
1.4 a 1 yes 1 1
1.5 a 1 no -1 -1
1.6 h 1 no 1 1
1.7 a 1 no -1 -1
1.8 a 1 no -1 -1
1.9 h 1 yes 1 1
1.10 a 1 yes 1 1
1.11 h 1 no 0 0
2.12 a 2 no 0 0
2.13 a 2 no 0 0
我有以下数据框:
df <- data.frame(t = c("h","h","h","a","a","h","a","a","h","a","h","a","a"), time = c(1,1,1,1,1,1,1,1,1,1,1,2,2),
key = c("no", "no", "no","yes","no","no","no","no","yes","yes","no","no","no"),
expected = c(-1,-1,-1,1,-1,1,-1,-1,1,1,0,0,0))
t time key expected myTest1 myTest2 myTest3
1 h 1 no -1 -1 1 1
2 h 1 no -1 -1 1 1
3 h 1 no -1 -1 1 1
4 a 1 yes 1 1 1 -1
5 a 1 no -1 1 0 -1
6 h 1 no 1 -1 0 1
7 a 1 no -1 1 0 -1
8 a 1 no -1 1 0 -1
9 h 1 yes 1 -1 0 1
10 a 1 yes 1 1 1 -1
11 h 1 no 0 -1 0 1
12 a 2 no 0 1 0 -1
13 a 2 no 0 1 0 -1
我正在尝试重新创建类似于 expected
的列。按 time
列分组,第一个条件是在 key
中有“是”的每一行中分配 1
。其他条件是:
- 如果
key
中包含下一个“yes”的行也包含t
中的“h”,分配1
直到每一行的“yes”行具有“h”,对于具有“a”的行-1
- 如果
key
中包含下一个“yes”的行也包含t
中的“a”,分配1
直到每一行的“yes”行对于具有“h”的行,具有“a”和-1
- 如果每个
time
部分中没有更多的“是”行,则将0
分配给该行
我首先尝试使用嵌套 for 循环:
df$myTest1 <- 0
testIdx <- which(df$key %in% "yes")
df$myTest1[testIdx] <- 1
for (i in 1:length(testIdx)) {
for (j in 1:nrow(df)) {
df$myTest1[j] <- ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "h", 1,
ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "a", -1,
ifelse(df$t[testIdx[i]] == "a" & df$t[j] == "h",
-1, ifelse(df$t[testIdx[i]] == "a" &
df$t[j] == "a", 1, 0))))
}
}
这会在 myTest1
中获取正确的值,直到并包括第一个“是”,但在不正确之后获取所有行。
我还尝试了另外两种方法来创建 myTest2
和 myTest3
:
df$myTest2 <- cumsum(c(1, head(df$key == "yes", -1))) %% 2
df <- df %>%
mutate(myTest3 = case_when(t == "h" ~ 1, #add if next "yes" is also "h" condition
t == "a" ~ -1,
TRUE ~ 0))
使用 case_when()
与 ifelse
类似,但我不知道如何在没有 for 循环的情况下添加其他条件。
为澄清起见,expected
列按原样读取,因为第一个“是”属于带有“a”的行,因此所有前面的“h”行得到 -1
而所有先前的“a”行得到 1
。下一个“是”行现在有“h”,所以“是”之间的行得到 1
表示“h”,-1
表示“a”。第 10 行包含一个“yes”并且紧跟在“yes”之后,所以它只是得到一个 1
。第 11 行是最后一个 time
= 1,后面没有“yes”,所以它被赋值为 0
。当 time
= 2 时没有“是”行,因此那里的所有行也收到 0
.
这可能对你有帮助。
魔法发生在 zoo
包中的 na.locf
函数中。
library(magrittr)
library(zoo)
doblock <- function(timeblock) {
yesrows <- which(timeblock$key == "yes")
if (length(yesrows) == 0) {
# no yes rows in timeblock: make all 0
timeblock$exp2 <- 0
} else {
# create a vector of a's and h's against which we need to match the t field
tomatch <- rep(NA, nrow(timeblock))
tomatch[yesrows] <- as.character(timeblock$t)[yesrows]
tomatch <- zoo::na.locf(tomatch, fromLast = TRUE)
# now do the matching
timeblock$exp2 <- 0 # set default as 0 (for those entries after the last 'yes')
timeblock$exp2[1:length(tomatch)] <-
mapply(function(t1, t2) {
if ((t1) == t2) 1 else -1
}, as.character(timeblock$t[1:length(tomatch)]), tomatch)
}
timeblock
}
# split dataframe into blocks for each 'time' and apply function to every time-block
newdf <-
lapply(split(df, df$time), doblock) %>%
do.call(rbind, .)
结果如下所示,其中 exp2
是上述函数的输出。与您的 expected
字段匹配。
t time key expected exp2
1.1 h 1 no -1 -1
1.2 h 1 no -1 -1
1.3 h 1 no -1 -1
1.4 a 1 yes 1 1
1.5 a 1 no -1 -1
1.6 h 1 no 1 1
1.7 a 1 no -1 -1
1.8 a 1 no -1 -1
1.9 h 1 yes 1 1
1.10 a 1 yes 1 1
1.11 h 1 no 0 0
2.12 a 2 no 0 0
2.13 a 2 no 0 0