在数据框的两列或多列中进行 Grep

Question

我有一个数据框，我想知道其中的某些字符串是否出现在某些列中，然后获取它们的行号，我正在使用它：

keywords <- c("knowledge management", "gestión del conocimiento")
npox <- grep(paste(keywords, collapse = "|"), full[,c(7)], ignore.case = T)

然而，这不适用于两列或更多列，只有一列 full[,c(7)] 谁知道我该怎么办？
示例数据 (csv)：https://tempsend.com/rxucj

Answer 1

不要在单个列上使用 grep，而是在整个数据帧上使用 grepl 作为字符矩阵。这将 return 一个逻辑向量。将逻辑向量转换为与原始数据框同维的矩阵，然后运行which，指定arr.ind = TRUE。这将为您提供正则表达式的所有匹配项的行和列。

keywords <- c("knowledge management", "gestión del conocimiento")
npox <- grepl(paste(keywords, collapse = "|"), as.matrix(full), ignore.case = T)

which(matrix(npox, nrow = nrow(full)), arr.ind = TRUE)
#>      row col
#> [1,]  16   8
#> [2,]  15   9
#> [3,]  15  10
#> [4,]  16  15
#> [5,]  16  23

比如我们可以看到第16行第8列有匹配。我们可以通过以下方式确认这一点：

full[16, 8]
#> [1] "The Impact of Human Resource Management Practices, Organisational 
#> Culture, Organisational Innovation and Knowledge Management on Organisational
#> Performance in Large Saudi Organisations: Structural Equation Modeling With 
#> Conceptual Framework"

我们在这个单元格中看到“知识管理”。

如果您想将结果限制在某些列中，可能最简单的方法是事后过滤掉结果。例如，假设我将 full 中的所有匹配项存储到名为 matches:

的变量中

matches <- which(matrix(npox, nrow = nrow(full)), arr.ind = TRUE)

但我只对第 7、8 和 9 列的匹配感兴趣，那么我可以这样做：

matches[matches[,'col'] %in% c(7, 8, 9),]
#>      row col
#> [1,]  16   8
#> [2,]  15   9

在数据框的两列或多列中进行 Grep

Grep in two or more columns of a dataframe

grep

r

dataframe