使用 Tidyr 的 "separate" 将一个字符串分成多个列,然后创建一个包含计数的新列
Separate a String using Tidyr's "separate" into Multiple Columns and then Create a New Column with Counts
所以我有下面的基本数据框,其中包含由 comma.I 分隔的长字符串,使用 Tidyr 的 "separate" 创建新列。
如何添加另一个新列,计算每个人有多少个包含答案的新列? (没有 NA)。
我想可以在分隔之后或之前计算列,通过计算有多少个由逗号分隔的字符串元素?
如有任何帮助,我们将不胜感激。我想留在 Tidyverse 和 dplyr 中。
Name<-c("John","Chris","Andy")
Goal<-c("Go back to school,Learn to drive,Learn to cook","Go back to school,Get a job,Learn a new Skill,Learn to cook","Learn to drive,Learn to Cook")
df<-data_frame(Name,Goal)
df<-df%>%separate(Goal,c("Goal1","Goal2","Goal3","Goal4"),sep=",")
我们可以试试str_count
library(stringr)
df %>%
separate(Goal,paste0("Goal", 1:4), sep=",", remove=FALSE) %>%
mutate(Count = str_count(Goal, ",")+1) %>%
select(-Goal)
# Name Goal1 Goal2 Goal3 Goal4 Count
# <chr> <chr> <chr> <chr> <chr> <dbl>
#1 John Go back to school Learn to drive Learn to cook <NA> 3
#2 Chris Go back to school Get a job Learn a new Skill Learn to cook 4
#3 Andy Learn to drive Learn to Cook <NA> <NA> 2
所以我有下面的基本数据框,其中包含由 comma.I 分隔的长字符串,使用 Tidyr 的 "separate" 创建新列。
如何添加另一个新列,计算每个人有多少个包含答案的新列? (没有 NA)。
我想可以在分隔之后或之前计算列,通过计算有多少个由逗号分隔的字符串元素?
如有任何帮助,我们将不胜感激。我想留在 Tidyverse 和 dplyr 中。
Name<-c("John","Chris","Andy")
Goal<-c("Go back to school,Learn to drive,Learn to cook","Go back to school,Get a job,Learn a new Skill,Learn to cook","Learn to drive,Learn to Cook")
df<-data_frame(Name,Goal)
df<-df%>%separate(Goal,c("Goal1","Goal2","Goal3","Goal4"),sep=",")
我们可以试试str_count
library(stringr)
df %>%
separate(Goal,paste0("Goal", 1:4), sep=",", remove=FALSE) %>%
mutate(Count = str_count(Goal, ",")+1) %>%
select(-Goal)
# Name Goal1 Goal2 Goal3 Goal4 Count
# <chr> <chr> <chr> <chr> <chr> <dbl>
#1 John Go back to school Learn to drive Learn to cook <NA> 3
#2 Chris Go back to school Get a job Learn a new Skill Learn to cook 4
#3 Andy Learn to drive Learn to Cook <NA> <NA> 2