交易清单到篮子数据
Transaction list to basket data
我有一个 table 喜欢
ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013
输出要求的格式
ID Purchase basket Year Abc-count xyz-count bcd-count
1A (Abc,xyz) 2011 2 1 0
1A (Abc) 2012 1 0 0
2A (bcd , Abc) 2013 1 0 1
我们可以使用 data.table
轻松做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)
),按 'ID' , 'Year', paste
unique
元素分组25=] 并分配 (:=
) 它以创建 'Purchase_basket' 列,然后 dcast
从 'long' 到 'wide' 指定 fun.aggregate
为 length
library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
# ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011 Abc, xyz 2 0 1
#2: 1A 2012 Abc 1 0 0
#3: 2A 2013 bcd, Abc 1 1 0
与 data.table 完全相同的逻辑,但使用 dplyr。
df_2 <- read.table(text = 'ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013',
header = TRUE, stringsAsFactors = FALSE)
df_2 %>% group_by( ID, Year) %>%
mutate(Abc_count=grepl("Abc", Productpurchased),
bcd_count=grepl("bcd", Productpurchased),
xyz_count=grepl("xyz", Productpurchased)) %>%
summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
Abc_count=sum(Abc_count),
bcd_count=sum(bcd_count),
xyz_count=sum(xyz_count))
我有一个 table 喜欢
ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013
输出要求的格式
ID Purchase basket Year Abc-count xyz-count bcd-count
1A (Abc,xyz) 2011 2 1 0
1A (Abc) 2012 1 0 0
2A (bcd , Abc) 2013 1 0 1
我们可以使用 data.table
轻松做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)
),按 'ID' , 'Year', paste
unique
元素分组25=] 并分配 (:=
) 它以创建 'Purchase_basket' 列,然后 dcast
从 'long' 到 'wide' 指定 fun.aggregate
为 length
library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
# ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011 Abc, xyz 2 0 1
#2: 1A 2012 Abc 1 0 0
#3: 2A 2013 bcd, Abc 1 1 0
与 data.table 完全相同的逻辑,但使用 dplyr。
df_2 <- read.table(text = 'ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013',
header = TRUE, stringsAsFactors = FALSE)
df_2 %>% group_by( ID, Year) %>%
mutate(Abc_count=grepl("Abc", Productpurchased),
bcd_count=grepl("bcd", Productpurchased),
xyz_count=grepl("xyz", Productpurchased)) %>%
summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
Abc_count=sum(Abc_count),
bcd_count=sum(bcd_count),
xyz_count=sum(xyz_count))