使用 R 中的字符串匹配比较和过滤多列

Question

我有一个这样的数据框：

    TS    Device1.max   Device2.max   Device3.max       Device4.max
18:02:44     FALSE        FALSE          TRUE               FALSE
18:02:45     TRUE         TRUE          FALSE               FALSE
18:02:46     FALSE        FALSE         FALSE               TRUE
18:02:47     FALSE        FALSE         FALSE               FALSE
18:02:48     FALSE        FALSE         FALSE               FALSE
18:02:49     FALSE        FALSE         FALSE               FALSE
18:02:50     FALSE        FALSE         FALSE               FALSE
18:02:51     FALSE        FALSE         FALSE               FALSE
18:02:52     FALSE        FALSE         FALSE               TRUE
18:02:53     FALSE        TRUE          FALSE               FALSE
18:02:54     FALSE        FALSE         FALSE               FALSE

为了获得真假列，我使用了以下代码：

df$Device1.max = ifelse(df$Device1 == max(df$Device1), 'true','false')
df$Device2.max = ifelse(df$Device2 == max(df$Device2), 'true','false')
df$Device3.max = ifelse(df$Device3 == max(df$Device3), 'true','false')
df$Device4.max = ifelse(df$Device4 == max(df$Device4), 'true','false')

为简单起见，我只显示了 4 个设备列。我有大约一百个设备列，我想在其中进行比较。在一百个 ifelse 语句中指定一百个列是不可行的我如何使用正则表达式进行比较或指定通用列名，假设所有感兴趣的设备列都有某种以 device?

开头的名称

然后我想过滤或找到最大 Device.max 列满足条件的行，其中它在其 +/-1 行内为 TRUE。从算法上讲，我将创建一个索引列并过滤到仅存在 TRUE 值的数据框。然后我会检查有多少列在彼此的 1 行内有索引。在上述情况下，行 1,2 & 3 有 4 列满足 true 条件，而行 9 & 10 只有 2 列满足条件。因此我的预期输出是：

     TS      Device1.max    Device2.max   Device3.max         Device4.max
    18:02:44     FALSE        FALSE          TRUE               FALSE
    18:02:45     TRUE         TRUE          FALSE               FALSE
    18:02:46     FALSE        FALSE         FALSE               TRUE

但是这种方法似乎非常迭代且效率低下。有没有更好的方法来利用 R 中的数据框函数？

Answer 1

此代码应回答第一个 TRUE/FALSE 问题

r <- c();
colum <- c();
for (colu in 2:ncol(example_table)){
  example_table[ ,colu] <- example_table[ ,colu]==max(example_table[ ,colu]) # returns True/False
  val <- which(example_table[ ,colu]==T) # searching for row indexes
  r <- append(r,val) # append row indexes
  colum <- append(colum,rep(colu,length(val))) # since one column can contain more than one True, repeat that column index and append it
}
true_values <- cbind(r,colum) # just a matrix-like output

输出：

> example_table
     V1 V2 V3 V4
1 18:02  5  8  1
2 14:05  7  1  7
3 19:27  7  6  1

# After for:

> example_table
     V1    V2    V3    V4
1 18:02 FALSE  TRUE FALSE
2 14:05  TRUE FALSE  TRUE
3 19:27  TRUE FALSE FALSE

> true_values
     r colum
[1,] 2     2
[2,] 3     2
[3,] 1     3
[4,] 2     4

其中 r 是行索引，colum 是包含真值的列索引。请注意 example_table[ ,colu]==max(example_table[ ,colu]) returns 一个 TRUE/FALSE 值，并注意需要 colum <- append(colum,rep(colu,length(val))) 以避免将来出现尺寸问题。

对于第二个问题，您现在有了包含 TRUE 值的行索引。然后，您可以实现一个代码，当 selected 行上方和下方的行包含一个 TRUE 值（any() 函数应该是一个好的值）时，select 该行。然后 subset() 带有该行索引的原始数据框。

使用 R 中的字符串匹配比较和过滤多列

Comparing and filtering across multiple columns using string matching in R

r

dataframe