无法获取由逗号分隔的单行数据,该行由另一列值分组
failed to get data in single row separated by comma that is grouped by another column values
我有一个包含许多变量的数据框,其中两个变量显示在示例数据集 test
中,代码如下:
test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))
我正在尝试将单词列中的单词加入新的数据框 fdata
和列 Dictionary
,按 row_numb
分组并用 ,
逗号分隔使用下面的代码:
fdata <- test %>%
select(row_numb, words) %>%
group_by(row_numb) %>%
unite(Dictionary, words, sep=",")
我无法得到预期的结果:
row_numb Dictionary
1 apply, assistance, benefit, compass, medical, online, renew
2 meet, service.... and so forth
谁能帮我找出我犯的错误。
unite
用于将多列粘贴在一起,而不是用于聚合一列。为此,将 summarise
与 paste(..., collapse = ', ')
一起使用,或者对于逗号分隔字符串的特定情况,toString
:
library(tidyverse)
test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))
test %>% group_by(row_numb) %>% summarise(words = toString(words))
#> # A tibble: 3 x 2
#> row_numb words
#> <dbl> <chr>
#> 1 1 apply, assistance, benefit, compass, medical, online, renew
#> 2 2 meet, service, website
#> 3 3 center, country, country, develop, highly, home, major, obtain
要使用 unite
,请指定新列的名称以及应粘贴在一起的列,可以选择使用 sep
参数,例如
iris %>% unite(sepal_l_w, Sepal.Length, Sepal.Width, sep = ' / ') %>% head()
#> sepal_l_w Petal.Length Petal.Width Species
#> 1 5.1 / 3.5 1.4 0.2 setosa
#> 2 4.9 / 3 1.4 0.2 setosa
#> 3 4.7 / 3.2 1.3 0.2 setosa
#> 4 4.6 / 3.1 1.5 0.2 setosa
#> 5 5 / 3.6 1.4 0.2 setosa
#> 6 5.4 / 3.9 1.7 0.4 setosa
另一种适用于此类任务的通用模式是 nest()
,然后是 mutate()
/map()
,如果您下一步需要执行的特定任务没有函数喜欢符合要求的 toString()
。它仍然只是三行代码:首先 nest()
您的数据,然后展平列表结构,然后 paste/collapse 将它们放在一起。
library(tidyverse)
test %>%
nest(-row_numb) %>%
mutate(Dictionary = map(data, unlist),
Dictionary = map_chr(Dictionary, paste, collapse = ", "))
#> # A tibble: 3 x 3
#> row_numb data Dictionary
#> <dbl> <list> <chr>
#> 1 1 <tibble [7 × … apply, assistance, benefit, compass, medical, o…
#> 2 2 <tibble [3 × … meet, service, website
#> 3 3 <tibble [8 × … center, country, country, develop, highly, home…
由 reprex package (v0.2.0) 创建于 2018-08-14。
我有一个包含许多变量的数据框,其中两个变量显示在示例数据集 test
中,代码如下:
test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))
我正在尝试将单词列中的单词加入新的数据框 fdata
和列 Dictionary
,按 row_numb
分组并用 ,
逗号分隔使用下面的代码:
fdata <- test %>%
select(row_numb, words) %>%
group_by(row_numb) %>%
unite(Dictionary, words, sep=",")
我无法得到预期的结果:
row_numb Dictionary
1 apply, assistance, benefit, compass, medical, online, renew
2 meet, service.... and so forth
谁能帮我找出我犯的错误。
unite
用于将多列粘贴在一起,而不是用于聚合一列。为此,将 summarise
与 paste(..., collapse = ', ')
一起使用,或者对于逗号分隔字符串的特定情况,toString
:
library(tidyverse)
test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))
test %>% group_by(row_numb) %>% summarise(words = toString(words))
#> # A tibble: 3 x 2
#> row_numb words
#> <dbl> <chr>
#> 1 1 apply, assistance, benefit, compass, medical, online, renew
#> 2 2 meet, service, website
#> 3 3 center, country, country, develop, highly, home, major, obtain
要使用 unite
,请指定新列的名称以及应粘贴在一起的列,可以选择使用 sep
参数,例如
iris %>% unite(sepal_l_w, Sepal.Length, Sepal.Width, sep = ' / ') %>% head()
#> sepal_l_w Petal.Length Petal.Width Species
#> 1 5.1 / 3.5 1.4 0.2 setosa
#> 2 4.9 / 3 1.4 0.2 setosa
#> 3 4.7 / 3.2 1.3 0.2 setosa
#> 4 4.6 / 3.1 1.5 0.2 setosa
#> 5 5 / 3.6 1.4 0.2 setosa
#> 6 5.4 / 3.9 1.7 0.4 setosa
另一种适用于此类任务的通用模式是 nest()
,然后是 mutate()
/map()
,如果您下一步需要执行的特定任务没有函数喜欢符合要求的 toString()
。它仍然只是三行代码:首先 nest()
您的数据,然后展平列表结构,然后 paste/collapse 将它们放在一起。
library(tidyverse)
test %>%
nest(-row_numb) %>%
mutate(Dictionary = map(data, unlist),
Dictionary = map_chr(Dictionary, paste, collapse = ", "))
#> # A tibble: 3 x 3
#> row_numb data Dictionary
#> <dbl> <list> <chr>
#> 1 1 <tibble [7 × … apply, assistance, benefit, compass, medical, o…
#> 2 2 <tibble [3 × … meet, service, website
#> 3 3 <tibble [8 × … center, country, country, develop, highly, home…
由 reprex package (v0.2.0) 创建于 2018-08-14。