交易清单到篮子数据

Transaction list to basket data

我有一个 table 喜欢

ID    Productpurchased   Year
1A          Abc          2011
1A          Abc          2011       
1A          xyz          2011
1A          Abc          2012
2A          bcd          2013
2A          Abc          2013

输出要求的格式

ID       Purchase basket     Year     Abc-count  xyz-count  bcd-count    
1A       (Abc,xyz)           2011      2           1          0
1A       (Abc)               2012      1           0          0
2A       (bcd , Abc)         2013      1           0          1

我们可以使用 data.table 轻松做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)),按 'ID' , 'Year', paste unique 元素分组25=] 并分配 (:=) 它以创建 'Purchase_basket' 列,然后 dcast 从 'long' 到 'wide' 指定 fun.aggregatelength

library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
       ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
#    ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011        Abc, xyz         2         0         1
#2: 1A 2012             Abc         1         0         0
#3: 2A 2013        bcd, Abc         1         1         0

与 data.table 完全相同的逻辑,但使用 dplyr。

df_2 <- read.table(text = 'ID    Productpurchased   Year
1A          Abc          2011
1A          Abc          2011       
1A          xyz          2011
1A          Abc          2012
2A          bcd          2013
2A          Abc          2013',
header = TRUE, stringsAsFactors = FALSE)



df_2 %>% group_by( ID, Year) %>%  
  mutate(Abc_count=grepl("Abc", Productpurchased), 
         bcd_count=grepl("bcd", Productpurchased),
         xyz_count=grepl("xyz", Productpurchased)) %>% 
  summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
            Abc_count=sum(Abc_count), 
            bcd_count=sum(bcd_count),
            xyz_count=sum(xyz_count))