重塑具有重复行的数据框

reshape a data frame with repeating rows

我有一个如下所示的数据框:

master_bill_no  category
SBA5100008  CONDOMS
SBA5100008  HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN
SBA5100010  CALCIUM PREPARATIONS
SBA5100010  VITAMINS
SBA5100010  BETABLOCKERS

下面给出了一个可重现的例子:

structure(list(master_bill_no = c("SBA5100008", "SBA5100008", 
"SBA5100009", "SBA5100010", "SBA5100010", "SBA5100010", "SBA5100010"
), category = c("CONDOMS", "HAND CREAM", "PREGNANCY TESTS", "MULTI VITAMINS & MIN", 
"CALCIUM PREPARATIONS", "VITAMINS", "BETABLOCKERS")), .Names = c("master_bill_no", 
"category"), class = "data.frame", row.names = c(NA, -7L))

对于每个唯一的主帐单编号,我正在尝试将列类别重塑为宽类别。

例如,所需的输出为:

master_bill_no  category
SBA5100008  CONDOMS,HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN,CALCIUM PREPARATIONS,CALCIUM PREPARATIONS,BETABLOCKERS

我使用了基本重塑公式,它只是删除了类别列。

reshape(df, idvar = "master_bill_no", timevar = "category", direction = "wide")

我试过聚合函数:

aggregate(df, master_bill_no, FUN = paste(category, sep = ","))

这returns一条错误信息"object category not found"

我确定这样做的原因是重塑正在寻找要填充的缺失值。有人可以帮忙吗?

恕我直言 - 最好使用聚合等基本函数: 正确的语法应该是:

aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
         ( the field ,    list of 'group by'     , the fun to operate on field )

>df
  master_bill_no             category
1     SBA5100008              CONDOMS
2     SBA5100008           HAND CREAM
3     SBA5100009      PREGNANCY TESTS
4     SBA5100010 MULTI VITAMINS & MIN
5     SBA5100010 CALCIUM PREPARATIONS
6     SBA5100010             VITAMINS
7     SBA5100010         BETABLOCKERS


> aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
     Group.1                                                                  x
1 SBA5100008                                                CONDOMS, HAND CREAM
2 SBA5100009                                                    PREGNANCY TESTS
3 SBA5100010 MULTI VITAMINS & MIN, CALCIUM PREPARATIONS, VITAMINS, BETABLOCKERS