计算唯一列表项
Counting unique list items
假设我有一个数据table dt.recipes
,它由包含各种项目的列表组成,例如:
recipe_id ingredients
1 apple, banana, cucumber, water
2 apple, meat, water
3 water
我如何创建一个 table,计算 dt.recipes$ingredients
中存在的 独特 项的数量?换句话说,我正在寻找与此类似的结果:
ingredient count
water 3
apple 2
banana 1
cucumber 1
meat 1
任何指点将不胜感激,在此先感谢!
你可以这样做:
as.data.frame(table(unlist(strsplit(df$ingredients, ", "))))
#> Var1 Freq
#> 1 apple 2
#> 2 banana 1
#> 3 cucumber 1
#> 4 meat 1
#> 5 water 3
数据
df <- structure(list(recipe_id = 1:3,
ingredients = c("apple, banana, cucumber, water",
"apple, meat, water",
"water")),
class = "data.frame", row.names = c(NA, -3L))
df
#> recipe_id ingredients
#> 1 1 apple, banana, cucumber, water
#> 2 2 apple, meat, water
#> 3 3 water
由 reprex package (v2.0.1)
创建于 2022-03-07
具有来自 tidyverse
的功能:
library(tidyverse)
df %>%
separate_rows(ingredients) %>%
count(ingredients, name = "count") %>%
arrange(desc(count))
# A tibble: 5 x 2
# ingredients count
# <chr> <int>
#1 water 3
#2 apple 2
#3 banana 1
#4 cucumber 1
#5 meat 1
一个data.table
方式可以是
library(data.table)
dt[, .(table(unlist(ingredients)))]
# V1 N
#1: apple 2
#2: banana 1
#3: cucumber 1
#4: meat 1
#5: water 3
数据
dt <- data.table(
"recipe_id" = 1:3,
"ingredients" = list(
c("apple", "banana", "cucumber", "water"),
c("apple", "meat", "water"),
c("water")
)
)
基础 R 选项 scan
+ table
+ as.data.frame
> with(df, as.data.frame(table(trimws(scan(text = ingredients, what = "", sep = ",", quiet = TRUE)))))
Var1 Freq
1 apple 2
2 banana 1
3 cucumber 1
4 meat 1
5 water 3
假设我有一个数据table dt.recipes
,它由包含各种项目的列表组成,例如:
recipe_id ingredients
1 apple, banana, cucumber, water
2 apple, meat, water
3 water
我如何创建一个 table,计算 dt.recipes$ingredients
中存在的 独特 项的数量?换句话说,我正在寻找与此类似的结果:
ingredient count
water 3
apple 2
banana 1
cucumber 1
meat 1
任何指点将不胜感激,在此先感谢!
你可以这样做:
as.data.frame(table(unlist(strsplit(df$ingredients, ", "))))
#> Var1 Freq
#> 1 apple 2
#> 2 banana 1
#> 3 cucumber 1
#> 4 meat 1
#> 5 water 3
数据
df <- structure(list(recipe_id = 1:3,
ingredients = c("apple, banana, cucumber, water",
"apple, meat, water",
"water")),
class = "data.frame", row.names = c(NA, -3L))
df
#> recipe_id ingredients
#> 1 1 apple, banana, cucumber, water
#> 2 2 apple, meat, water
#> 3 3 water
由 reprex package (v2.0.1)
创建于 2022-03-07具有来自 tidyverse
的功能:
library(tidyverse)
df %>%
separate_rows(ingredients) %>%
count(ingredients, name = "count") %>%
arrange(desc(count))
# A tibble: 5 x 2
# ingredients count
# <chr> <int>
#1 water 3
#2 apple 2
#3 banana 1
#4 cucumber 1
#5 meat 1
一个data.table
方式可以是
library(data.table)
dt[, .(table(unlist(ingredients)))]
# V1 N
#1: apple 2
#2: banana 1
#3: cucumber 1
#4: meat 1
#5: water 3
数据
dt <- data.table(
"recipe_id" = 1:3,
"ingredients" = list(
c("apple", "banana", "cucumber", "water"),
c("apple", "meat", "water"),
c("water")
)
)
基础 R 选项 scan
+ table
+ as.data.frame
> with(df, as.data.frame(table(trimws(scan(text = ingredients, what = "", sep = ",", quiet = TRUE)))))
Var1 Freq
1 apple 2
2 banana 1
3 cucumber 1
4 meat 1
5 water 3