每当一行包含字符串中的字符时,如何添加一个计数器?
How can I add a counter for each time a row contains a character in a string?
我有一个数据框,其中主要包含我要与模式列表匹配的字符。我试图在末尾添加一列来计算该行中列表的命中次数。基本上,我要做的是
patterns <- c("Yes", "No", "Maybe")
df <- data.frame (first_column = c("Why", "Sure", "But", ...),
second_column = c("Yes", "Okay", "If Only" ...),
third_column = c("No", "When", "Maybe so" ...),
fourth_column = c("But", "I won't", "Truth" ...)
)
在 运行 代码之后,在标有“计数器”的第五列下方,您会看到 2、0、1,...
现在我用一对嵌套的 for 循环来完成这个,里面有一个 if 语句。它适用于玩具数据集,但我认为如果我在全尺寸数据上尝试它会崩溃。使用 dplyr、grepl 或 lapply 有更好的方法吗?我的直觉是 dplyr,但我不确定该怎么做。我的代码如下:
filename = choose.files(caption='Select File')
cases = read.csv(filename)
cases = cbind(cases, counter=0)
l = nrow(cases)
col = ncol(cases)
for (i in 1:l){
for (j in 1:col){
if(cases[i,j] %in% patterns)
{
cases$counter[i]=cases$counter[i]+1
}
}
}
试试这个
df$counter <- rowSums(vapply(df, function(x, p) grepl(p, x), integer(nrow(df)), paste0(patterns, collapse = "|")))
输出
> df
first_column second_column third_column fourth_column counter
1 Why Yes No But 2
2 Sure Okay When I won't 0
3 But If Only Maybe so Truth 1
在 dplyr
中我们可以使用 rowwise
和 c_across
:
library(dplyr)
df %>% rowwise() %>% mutate(counter = sum(c_across() %in% patterns))
# first_column second_column third_column fourth_column counter
# <chr> <chr> <chr> <chr> <int>
#1 Why Yes No But 2
#2 Sure Okay When I won't 0
#3 But If Only Maybe so Truth 0
我们可以使用 map
和 reduce
按列执行此操作
library(dplyr)
library(purrr)
df %>%
mutate(counter = map(patterns, ~ rowSums(cur_data() == .x)) %>%
reduce(`+`))
# first_column second_column third_column fourth_column counter
#1 Why Yes No But 2
#2 Sure Okay When I won't 0
#3 But If Only Maybe so Truth 0
数据
df <- structure(list(first_column = c("Why", "Sure", "But"),
second_column = c("Yes",
"Okay", "If Only"), third_column = c("No", "When", "Maybe so"
), fourth_column = c("But", "I won't", "Truth")), class = "data.frame",
row.names = c(NA,
-3L))
我有一个数据框,其中主要包含我要与模式列表匹配的字符。我试图在末尾添加一列来计算该行中列表的命中次数。基本上,我要做的是
patterns <- c("Yes", "No", "Maybe")
df <- data.frame (first_column = c("Why", "Sure", "But", ...),
second_column = c("Yes", "Okay", "If Only" ...),
third_column = c("No", "When", "Maybe so" ...),
fourth_column = c("But", "I won't", "Truth" ...)
)
在 运行 代码之后,在标有“计数器”的第五列下方,您会看到 2、0、1,... 现在我用一对嵌套的 for 循环来完成这个,里面有一个 if 语句。它适用于玩具数据集,但我认为如果我在全尺寸数据上尝试它会崩溃。使用 dplyr、grepl 或 lapply 有更好的方法吗?我的直觉是 dplyr,但我不确定该怎么做。我的代码如下:
filename = choose.files(caption='Select File')
cases = read.csv(filename)
cases = cbind(cases, counter=0)
l = nrow(cases)
col = ncol(cases)
for (i in 1:l){
for (j in 1:col){
if(cases[i,j] %in% patterns)
{
cases$counter[i]=cases$counter[i]+1
}
}
}
试试这个
df$counter <- rowSums(vapply(df, function(x, p) grepl(p, x), integer(nrow(df)), paste0(patterns, collapse = "|")))
输出
> df
first_column second_column third_column fourth_column counter
1 Why Yes No But 2
2 Sure Okay When I won't 0
3 But If Only Maybe so Truth 1
在 dplyr
中我们可以使用 rowwise
和 c_across
:
library(dplyr)
df %>% rowwise() %>% mutate(counter = sum(c_across() %in% patterns))
# first_column second_column third_column fourth_column counter
# <chr> <chr> <chr> <chr> <int>
#1 Why Yes No But 2
#2 Sure Okay When I won't 0
#3 But If Only Maybe so Truth 0
我们可以使用 map
和 reduce
library(dplyr)
library(purrr)
df %>%
mutate(counter = map(patterns, ~ rowSums(cur_data() == .x)) %>%
reduce(`+`))
# first_column second_column third_column fourth_column counter
#1 Why Yes No But 2
#2 Sure Okay When I won't 0
#3 But If Only Maybe so Truth 0
数据
df <- structure(list(first_column = c("Why", "Sure", "But"),
second_column = c("Yes",
"Okay", "If Only"), third_column = c("No", "When", "Maybe so"
), fourth_column = c("But", "I won't", "Truth")), class = "data.frame",
row.names = c(NA,
-3L))