在每行中查找多个部分字符串,并使用字符串所在的列创建一个变量
Find multiple partial strings in each row and create a variable with the column the string is in
我发现了类似的问题,但还没有解决我的问题。
我有一个巨大的数据框,我正在尝试查找出现一系列字符串的位置。
这是我的示例数据:
var1 <- c("Goats","Sheep","Pigs","Dog","Zebu","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
var2 <- c("cats","Dog","birds","rabbits","Plant","fish","guinea pigs","cat","Mouse","dog")
var3 <- c("cats","Dog","birds","rabbits","horses","Sheep","Pigs","Tree","Zebu","Donkeys")
var4 <- c("Plant","Ant","Bee","Tree","Water buffaloes","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
df <- data.frame(var1, var2, var3, var4)
head(df)
var1 var2 var3 var4
1 Goats cats cats Plant
2 Sheep Dog Dog Ant
3 Pigs birds birds Bee
4 Dog rabbits rabbits Tree
5 Zebu Plant horses Water buffaloes
6 Donkeys fish Sheep Donkeys
>
我可以找到 Tree 和 plant 每次出现的行和列:
strings = c( "Tree","Plant")
which(apply(df, 1, function(x) any(grepl(paste(strings, collapse = "|"), x))))# ROW
[1] 1 4 5 8
which(apply(df, 2, function(x) any(grepl(paste(strings, collapse = "|"), x)))) # COLUMN
var2 var3 var4
2 3 4
但无法弄清楚如何将列名(或索引)放入每行的新变量中。这就是我想要的:
> head(df)
var1 var2 var3 var4 x1
1 Goats cats cats Plant var4
2 Sheep Dog Dog Ant NA
3 Pigs birds birds Bee NA
4 Dog rabbits rabbits Tree var4
5 Zebu Plant horses Water buffaloes var2
6 Donkeys fish Sheep Donkeys NA
我认为带有 str_detect 或 str_which 的东西会,但不确定如何在所有列中做到这一点。
类似于:
df <- df %>%
mutate(words = ifelse(any_vars(str_detect(strings))))
感谢您的宝贵时间和帮助。
我们可以用 rowwise
-
来实现
library(dplyr)
cols <- names(df)
df %>%
rowwise() %>%
mutate(x1 = toString(cols[c_across() %in% strings])) %>%
ungroup
# var1 var2 var3 var4 x1
# <chr> <chr> <chr> <chr> <chr>
# 1 Goats cats cats Plant "var4"
# 2 Sheep Dog Dog Ant ""
# 3 Pigs birds birds Bee ""
# 4 Dog rabbits rabbits Tree "var4"
# 5 Zebu Plant horses Water buffaloes "var2"
# 6 Donkeys fish Sheep Donkeys ""
# 7 Water buffaloes guinea pigs Pigs Water buffaloes ""
# 8 Dromedary camel cat Tree Dromedary camel "var3"
# 9 Dog Mouse Zebu Dog ""
#10 Pig dog Donkeys Pig ""
在基础 R 中,apply
-
df$x1 <- apply(df, 1, function(x) toString(cols[x %in% strings]))
我发现了类似的问题,但还没有解决我的问题。 我有一个巨大的数据框,我正在尝试查找出现一系列字符串的位置。 这是我的示例数据:
var1 <- c("Goats","Sheep","Pigs","Dog","Zebu","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
var2 <- c("cats","Dog","birds","rabbits","Plant","fish","guinea pigs","cat","Mouse","dog")
var3 <- c("cats","Dog","birds","rabbits","horses","Sheep","Pigs","Tree","Zebu","Donkeys")
var4 <- c("Plant","Ant","Bee","Tree","Water buffaloes","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
df <- data.frame(var1, var2, var3, var4)
head(df)
var1 var2 var3 var4
1 Goats cats cats Plant
2 Sheep Dog Dog Ant
3 Pigs birds birds Bee
4 Dog rabbits rabbits Tree
5 Zebu Plant horses Water buffaloes
6 Donkeys fish Sheep Donkeys
>
我可以找到 Tree 和 plant 每次出现的行和列:
strings = c( "Tree","Plant")
which(apply(df, 1, function(x) any(grepl(paste(strings, collapse = "|"), x))))# ROW
[1] 1 4 5 8
which(apply(df, 2, function(x) any(grepl(paste(strings, collapse = "|"), x)))) # COLUMN
var2 var3 var4
2 3 4
但无法弄清楚如何将列名(或索引)放入每行的新变量中。这就是我想要的:
> head(df)
var1 var2 var3 var4 x1
1 Goats cats cats Plant var4
2 Sheep Dog Dog Ant NA
3 Pigs birds birds Bee NA
4 Dog rabbits rabbits Tree var4
5 Zebu Plant horses Water buffaloes var2
6 Donkeys fish Sheep Donkeys NA
我认为带有 str_detect 或 str_which 的东西会,但不确定如何在所有列中做到这一点。 类似于:
df <- df %>%
mutate(words = ifelse(any_vars(str_detect(strings))))
感谢您的宝贵时间和帮助。
我们可以用 rowwise
-
library(dplyr)
cols <- names(df)
df %>%
rowwise() %>%
mutate(x1 = toString(cols[c_across() %in% strings])) %>%
ungroup
# var1 var2 var3 var4 x1
# <chr> <chr> <chr> <chr> <chr>
# 1 Goats cats cats Plant "var4"
# 2 Sheep Dog Dog Ant ""
# 3 Pigs birds birds Bee ""
# 4 Dog rabbits rabbits Tree "var4"
# 5 Zebu Plant horses Water buffaloes "var2"
# 6 Donkeys fish Sheep Donkeys ""
# 7 Water buffaloes guinea pigs Pigs Water buffaloes ""
# 8 Dromedary camel cat Tree Dromedary camel "var3"
# 9 Dog Mouse Zebu Dog ""
#10 Pig dog Donkeys Pig ""
在基础 R 中,apply
-
df$x1 <- apply(df, 1, function(x) toString(cols[x %in% strings]))