R 列比较和筛选

Question

我有一个看起来像这样的数据框，其中列名作为日期；

2013_11 | 2013_12 | 2014_01 | 2014_02 | 2014_03 |

 NA | NA | 3  | 3  | N  |
  2 | 2  | 3  | NA | NA |
 NA | NA | NA | NA | NA |

我需要编写某种逻辑函数来过滤掉我要查找的行。我只需要拉出 2013 年任何一个月都没有数字的行（前两列），但 DID 在 2014 年的任何一列中至少有 1 个数字。

所以代码只会为我拉回第一行；

NA | NA | 3  | 3  | N  |

我想不出最有效的方法，因为我有大约 800 万行。

Answer 1

你可以试试

indx1 <- grep('2013', colnames(df))
indx2 <- grep('2014', colnames(df))
df[!rowSums(!is.na(df[indx1]))&!!rowSums(!is.na(df[indx2])),]
#   2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

或者您可以使用

i1 <- Reduce(`&`, lapply(df[indx1], function(x) is.na(x)))
i2 <- Reduce(`&`, lapply(df[indx2], function(x) !is.na(x)))
df[i1 &i2,]
# 2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

数据

df <- structure(list(`2013_11` = c(NA, 2L, NA), `2013_12` = c(NA, 2L, 
NA), `2014_01` = c(3L, 3L, NA), `2014_02` = c(3L, NA, NA), `2014_03` = c("N", 
NA, NA)), .Names = c("2013_11", "2013_12", "2014_01", "2014_02", 
"2014_03"), class = "data.frame", row.names = c(NA, -3L))

Answer 2

您是否考虑过使用 grep。我会创建一个函数来执行此操作，如下所示。在 for 循环中使用 R 的 any、all、is.na 和 if 语句。

grep_function <- function(src, condition1, condition2) {
    for(i in 1:length(src[[1]])){
        data_condition1 <- src[i, grepl(condition1, names(src))]
        data_condition2 <- src[i, grepl(condition2, names(src))]
        if(all(is.na(data_condition1) && any(!is.na(data_condition2)))) {
            // do something here to each individual observation
        } else {
            // do something for those that do not meet your criterea
        }
    }
}

示例：grep_function(your-data-here, "2013", "2014")

Answer 3

或者您可以使用 SQL（它有点冗长，但对某些人来说可能更易读）：

require('sqldf')

a=data.frame("2013_11"=c(NA,2,NA), "2013_12"=c(NA,2,NA), "2014_01" =c(3,3,NA),
             "2014_02" =c(3,NA,NA) ,"2014_03" =c(NA,NA,NA))

sqldf("select * from a where 
        case when X2013_11 is null then 0 else 1 end +
        case when X2013_12 is null then 0 else 1 end = 0 
        and
        case when X2014_01 is null then 0 else 1 end +
        case when X2014_02 is null then 0 else 1 end +
        case when X2014_03 is null then 0 else 1 end > 0
      ")

 X2013_11 X2013_12 X2014_01 X2014_02 X2014_03
       NA       NA        3        3       NA

R 列比较和筛选

R column comparison and filter

r

filter

数据