在数据框中的跨行序列中查找元素

Question

我有一个结构如下所示的数据集。

# example data set 

a <- "a"
b <- "b"
d <- "d"

id1 <- c(a,a,a,a,b,b,d,d,a,a,d)
id2 <- c(b,d,d,d,a,a,a,a,b,b,d)
id3 <- c(b,d,d,a,a,a,a,d,b,d,d)

dat <- rbind(id1,id2,id3)
dat <- data.frame(dat)

我需要在每一行中找到具有重复元素 "a" 的 first 序列，并立即识别序列后面的元素。

# desired results

dat$s3 <- c("b","b","d")
dat

我能够通过 3 个步骤解决问题并解决第一个问题，但由于我的编程技能非常有限，如果您能就如何处理第 2 步和第 3 步提出任何建议，我将不胜感激。如果您有解决问题的想法以另一种方式解决问题，这也将非常有帮助。

这是我目前的情况：

# Step 1: find the first occurence of "a" in the fist sequence 
dat$s1 <- apply(dat, 1, function(x) match(a,x))

# Step 2: find the last occurence in the first sequence 

# Step 3: find the element following the last occurence in the first sequence

提前致谢！

Answer 1

嗯，这是一个有点乱的尝试，

l1 <- lapply(apply(dat, 1, function(i) as.integer(which(i == a))), 
                           function(j) j[cumsum(c(1, diff(j) != 1)) == 1])

ind <- unname(sapply(l1, function(i) tail(i, 1) + 1))

dat$s3 <- diag(as.matrix(dat[ind]))

dat$s3
#[1] "b" "b" "d"

或者将其包装在一个函数中，

fun1 <- function(df){
  l1 <- lapply(apply(df, 1, function(i) as.integer(which(i == a))), 
               function(j) j[cumsum(c(1, diff(j) != 1)) == 1])
  ind <- unname(sapply(l1, function(i) tail(i, 1) + 1))
  return(diag(as.matrix(df[ind])))
}

fun1(dat)
#[1] "b" "b" "d"

Answer 2

试试这个（假设你在每一行重复了一个）：

library(stringr)
dat$s3 <-apply(dat, 1, function(x) str_match(paste(x, collapse=''),'aa([^a])')[,2])

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 s3
id1  a  a  a  a  b  b  d  d  a   a   d  b
id2  b  d  d  d  a  a  a  a  b   b   d  b
id3  b  d  d  a  a  a  a  d  b   d   d  d

Answer 3

我会使用 filter:

fun <- function(x) {
  x <- as.character(x)
  isa <- (x == "a") #find "a" values

  #find sequences with two TRUE values and the last value FALSE
  ids <- stats::filter(isa, c(1,1,1), sides = 1) == 2L & !isa

  na.omit(x[ids])[1] #subset     
}

apply(dat, 1, fun)
#id1 id2 id3 
#"b" "b" "d"

在数据框中的跨行序列中查找元素

Find an element following a sequence across rows in a data frame

r

rows

sequence

apply