重新排列 R 中的数据以进行购物篮分析
Rearranging data in R for market basket analysis
我有 csv 格式的可用数据。
数据格式如下。收据编号在一列,产品在相应列
Receipt_no Product
A1 Apple
A1 Banana
A1 Orange
A2 Pineapple
A2 Jackfruit
A3 Cola
A3 Tea
我想将它们重新排列为
A1 , Apple, Banana, Orange
A2 , Pineapple, Jackfruit
A3 , Cola, Tea
也就是收货号和商品名在一行,中间用逗号隔开。由于数据很大,我想在 R 中重新排列相同的数据。
请帮忙
谢谢。
此致,
尼斯
基础 R,
aggregate(Product ~ Receipt_no, df, paste, collapse = ',')
使用dplyr
,
df %>%
group_by(Receipt_no) %>%
summarise(new = paste(Product, collapse = ','))
使用基数 R:
u <- as.vector(unique(df$Receipt_no))
as.list(sapply(u, function(x) paste0(x, ", ", paste0(subset(df$Product, df$Receipt_no==x), collapse = ", "))))
# $A1
# [1] "A1, Apple, Banana, Orange"
# $A2
# [1] "A2, Pineapple, Jackfruit"
# $A3
# [1] "A3, Cola, Tea"
数据
df <- structure(list(Receipt_no = structure(c(1L, 1L, 1L, 2L, 2L, 3L,
3L), .Label = c("A1", "A2", "A3"), class = "factor"), Product = structure(c(1L,
2L, 5L, 6L, 4L, 3L, 7L), .Label = c("Apple", "Banana", "Cola",
"Jackfruit", "Orange", "Pineapple", "Tea"), class = "factor")), .Names = c("Receipt_no",
"Product"), class = "data.frame", row.names = c(NA, -7L))
我有 csv 格式的可用数据。
数据格式如下。收据编号在一列,产品在相应列
Receipt_no Product
A1 Apple
A1 Banana
A1 Orange
A2 Pineapple
A2 Jackfruit
A3 Cola
A3 Tea
我想将它们重新排列为
A1 , Apple, Banana, Orange
A2 , Pineapple, Jackfruit
A3 , Cola, Tea
也就是收货号和商品名在一行,中间用逗号隔开。由于数据很大,我想在 R 中重新排列相同的数据。
请帮忙
谢谢。
此致, 尼斯
基础 R,
aggregate(Product ~ Receipt_no, df, paste, collapse = ',')
使用dplyr
,
df %>%
group_by(Receipt_no) %>%
summarise(new = paste(Product, collapse = ','))
使用基数 R:
u <- as.vector(unique(df$Receipt_no))
as.list(sapply(u, function(x) paste0(x, ", ", paste0(subset(df$Product, df$Receipt_no==x), collapse = ", "))))
# $A1
# [1] "A1, Apple, Banana, Orange"
# $A2
# [1] "A2, Pineapple, Jackfruit"
# $A3
# [1] "A3, Cola, Tea"
数据
df <- structure(list(Receipt_no = structure(c(1L, 1L, 1L, 2L, 2L, 3L,
3L), .Label = c("A1", "A2", "A3"), class = "factor"), Product = structure(c(1L,
2L, 5L, 6L, 4L, 3L, 7L), .Label = c("Apple", "Banana", "Cola",
"Jackfruit", "Orange", "Pineapple", "Tea"), class = "factor")), .Names = c("Receipt_no",
"Product"), class = "data.frame", row.names = c(NA, -7L))