R - 延迟在函数内不起作用 [objective:匹配相似的相邻行]

R - Lag not working within function [objective: match similar adjacent rows]

我最近尝试根据两个变量(下面的条件 1 和结果 1) 匹配数据框中的 相邻 相同行。我见过有人对所有行都这样做,但对相邻行不这样做,这就是为什么我开发了以下三步解决方法(我希望没有想太多):

-我滞后了我希望完成匹配所基于的变量。

-我比较了变量和滞后变量

-我删除了所有相同的行(并删除了剩余的不必要的列)。

Case <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Condition1 <- c(0, 1, 0, 0, 1)
Outcome1 <- c(0, 0, 0, 0, 1)
mwa.df <- data.frame(Case, Condition1, Outcome1)

new.df <- mwa.df
Condition_lag <- c(new.df$Condition1[-1],0)
Outcome_lag <- c(new.df$Outcome1[-1],0)
new.df <- cbind(new.df, Condition_lag, Outcome_lag)
new.df$Comp <- 0
new.df$Comp[new.df$Outcome1 == new.df$Outcome_lag & new.df$Condition1 == new.df$Condition_lag] <- 1
new.df <- subset(new.df, Comp == 0)
new.df <- subset(new.df, select = -c(Condition_lag, Outcome_lag, Comp))

这工作得很好。但是当我试图为这个创建一个函数时,因为我必须用大量的数据帧做这个操作,我遇到了lag不起作用的问题(即未执行 condition_lag <- c(new.df$condition[-1],0)outcome_lag <- c(new.df$outcome[-1],0) 操作)。函数代码为:

FLC.Dframe <- function(old.df, condition, outcome){
      new.df <- old.df
      condition_lag <- c(new.df$condition[-1],0)
      outcome_lag <- c(new.df$outcome[-1],0)
      new.df <- cbind(new.df, condition_lag, outcome_lag)
      new.df$comp <- 0
      new.df$comp[new.df$outcome == new.df$outcome_lag & new.df$condition == new.df$condition_lag] <- 1
      new.df <- subset(new.df, comp == 0)
      new.df <- subset(new.df, select = -c(condition_lag, outcome_lag, comp))
      return(new.df)
}

关于函数的使用,我写了new.df <- FLC.Dframe(mwa.df, Condition1, Outcome1)

有人可以帮我解决这个问题吗?非常感谢。

只需生成 运行 长度的 ID 并删除重复项。

with(mwa.df, mwa.df[!duplicated(data.table::rleid(Condition1, Outcome1)), ])

输出

    Case Condition1 Outcome1
1 Case 1          0        0
2 Case 2          1        0
3 Case 3          0        0
5 Case 5          1        1

如果你想要一个功能,那么

FLC.Dframe <- function(df, cols) df[!duplicated(data.table::rleidv(df[, cols])), ]

像这样调用这个函数

> FLC.Dframe(mwa.df, c("Condition1", "Outcome1"))

    Case Condition1 Outcome1
1 Case 1          0        0
2 Case 2          1        0
3 Case 3          0        0
5 Case 5          1        1

您函数的主要问题是 $ 的不正确使用。此运算符按原样处理 RHS 输入。例如,在这一行 new.df$condition 中,$ 运算符试图在 new.df 中查找名为 "condition" 的列,而不是 "Condition1",这是 condition。如果您按如下方式重写您的函数,那么它应该可以工作。

FLC.Dframe <- function(old.df, condition, outcome){
  new.df <- old.df
  condition_lag <- c(new.df[[condition]][-1],0)
  outcome_lag <- c(new.df[[outcome]][-1],0)
  new.df <- cbind(new.df, condition_lag, outcome_lag)
  new.df$comp <- 0
  new.df$comp[new.df[[outcome]] == new.df[["outcome_lag"]] & new.df[[condition]] == new.df[["condition_lag"]]] <- 1
  new.df <- subset(new.df, comp == 0)
  new.df <- subset(new.df, select = -c(condition_lag, outcome_lag, comp))
  return(new.df)
} 

你也需要这样调用(注意需要使用字符作为输入)

> FLC.Dframe(mwa.df, "Condition1", "Outcome1")

    Case Condition1 Outcome1
1 Case 1          0        0
2 Case 2          1        0
4 Case 4          0        0
5 Case 5          1        1