在存在“NA”的情况下总结字符串重复项的持续时间

Summarise durations of string duplicates in the presence of `NA`

我在 A_aoiB_aoi 列中有注视方向数据,在 A_durB_dur 列中有相应的注视持续时间:

df <- data.frame(
  id = 1:4,
  A_aoi = c("C*BB*B", "C*BCCC", "B**", "C*B"),
  A_dur = c("234,312,222,3456,1112,77", "12,13,14,15,11,1654", "896,45222,55", "5554,322,142"),
  B_aoi = c("**ACC", "AC*", "AAA", "C*A*"),
  B_dur =c("12,13,15,100,100", "1,2,3", "88,99,100", "1,2,3,4")
)

在某些情况下,两个或多个直接相邻重复(即相同类型的测量);例如,第一个 A_aoi 值包含字符串 BB,第二个值包含 CCC.

我需要总结这些重复的持续时间。借助上一个问题 中的代码,我能够完成此任务:

library(data.table)

calculate <- function(p, q) {
  mapply(function(x, y) toString(tapply(as.numeric(x), rleid(y), sum)), 
         strsplit(p, ','), strsplit(q, ''))
}

aoi_cols <- grep('aoi', names(df))
dur_cols <- grep('dur', names(df))
df[dur_cols] <- Map(calculate, df[dur_cols], df[aoi_cols])
df
  id  A_aoi                    A_dur B_aoi       B_dur
1  1 C*BB*B 234, 312, 3678, 1112, 77 **ACC 25, 15, 200
2  2 C*BCCC         12, 13, 14, 1680   AC*     1, 2, 3
3  3    B**               896, 45277   AAA         287
4  4    C*B           5554, 322, 142  C*A*  1, 2, 3, 4

BUT:在我的实际数据中,有NA个值。例如,在这个稍微修改过的 df 中,我在列 B_dur 中添加了一个 NA 值,代码抛出错误:

df <- data.frame(
  id = 1:4,
  A_aoi = c("C*BB*B", "C*BCCC", "B**", "C*B"),
  A_dur = c("234,312,222,3456,1112,77", "12,13,14,15,11,1654", "896,45222,55", "5554,322,142"),
  B_aoi = c("**ACC", "AC*", "AAA", "C*A*"),
  B_dur =c("12,13,15,100,100", NA, "88,99,100", "1,2,3,4")
)

如何在有NA的情况下完成任务,结果是这样的:

df
  id  A_aoi                    A_dur B_aoi       B_dur
1  1 C*BB*B 234, 312, 3678, 1112, 77 **ACC 25, 15, 200
2  2 C*BCCC         12, 13, 14, 1680   AC*        <NA>
3  3    B**               896, 45277   AAA         287
4  4    C*B           5554, 322, 142  C*A*  1, 2, 3, 4

您可以修改 calculate 函数来检查 NA 值。

library(data.table)

calculate <- function(p, q) {
  mapply(function(x, y) {
    if(any(is.na(x))) NA 
    else toString(tapply(as.numeric(x), rleid(y), sum))
    }, strsplit(p, ','), strsplit(q, ''))
}

df[dur_cols] <- Map(calculate, df[dur_cols], df[aoi_cols])
df

#  id  A_aoi                    A_dur B_aoi       B_dur
#1  1 C*BB*B 234, 312, 3678, 1112, 77 **ACC 25, 15, 200
#2  2 C*BCCC         12, 13, 14, 1680   AC*        <NA>
#3  3    B**               896, 45277   AAA         287
#4  4    C*B           5554, 322, 142  C*A*  1, 2, 3, 4