R根据应用于多列的多个部分字符串过滤行
R filter rows based on multiple partial strings applied to multiple columns
数据集样本:
diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))
df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)
我想过滤在给定列列表(例如 diag01、diag02 等)中的任何位置具有部分字符串匹配的整个行。我可以在单个列上实现这一点,例如
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))
但我需要将其应用于多列(原始数据集有 216 列和 >1,000,000 行)。在其他选项中,我尝试过
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))
我需要整行,理想情况下我希望将过滤条件限制在给定的列列表中,因为其他列中的值可能以声明的部分字符串开头。
真正努力寻找解决方案,但显然我对 R 的了解不足。
也许我们需要
df %>%
filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))
或 purrr
和 dplyr
library(dplyr)
library(purrr)
df %>%
map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>%
reduce(`|`) %>%
df[.,]
数据集样本:
diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))
df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)
我想过滤在给定列列表(例如 diag01、diag02 等)中的任何位置具有部分字符串匹配的整个行。我可以在单个列上实现这一点,例如
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))
但我需要将其应用于多列(原始数据集有 216 列和 >1,000,000 行)。在其他选项中,我尝试过
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))
我需要整行,理想情况下我希望将过滤条件限制在给定的列列表中,因为其他列中的值可能以声明的部分字符串开头。
真正努力寻找解决方案,但显然我对 R 的了解不足。
也许我们需要
df %>%
filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))
或 purrr
和 dplyr
library(dplyr)
library(purrr)
df %>%
map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>%
reduce(`|`) %>%
df[.,]