是否有 R 函数通过乘以另一列的值来添加列?

Is there an R function to add columns by multipling value of an other column?

我想进行关联分析,但需要将我的数据框转换为正确的格式,该格式仅显示交易。 1) 如何将我的 "Sub Category" 列乘以 "Quantity" 列的数量?

2) 如何按订单 ID 对交易进行分组?

我有这个 df:

 `Order ID`        `Sub-Category` `Quantity`
  <chr>              <chr>             <dbl>

1 CA-2017-152156    Bookcases             2

2 CA-2017-152156    Chairs                3

3 CA-2017-138688    Labels                2

1) 我想要这个:

  `Order ID`     `Sub-Category` `Sub-Category2`   `Sub-Category3`
  <chr>          <chr>             <chr>           <chr>

1 CA-2017-152156 Bookcases        Bookcases         NULL

2 CA-2017-152156 Chairs             Chairs          Chairs

3 CA-2017-138688 Labels            Labels           NULL

(之后我想合并相同的订单 ID。例如第 1 行和第 2 行。您对此有提示吗?) 谢谢!

以下对第 1 点的回答)。

Max <- max(df1$Quantity)
res <- lapply(seq_len(nrow(df1)), function(i){
  c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
})
res <- cbind(df1[1], do.call(rbind, res))
names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])

res
#        Order ID Sub-Category1 Sub-Category2 Sub-Category3
#1 CA-2017-152156     Bookcases     Bookcases          <NA>
#2 CA-2017-152156        Chairs        Chairs        Chairs
#3 CA-2017-138688        Labels        Labels          <NA>

dput 格式的数据。

df1 <-
structure(list(`Order ID` = structure(c(2L, 2L, 1L), 
.Label = c("CA-2017-138688", "CA-2017-152156"), 
class = "factor"), `Sub-Category` = structure(1:3, 
.Label = c("Bookcases", "Chairs", "Labels"), class = 
"factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame", 
row.names = c("1", "2", "3"))

要回答问题 1) 使用 tidyverse,一种方法是创建一个新列 repeat every Sub-Category Quantity 次并将其存储为一个comma-separated 格式的字符串,然后将它们 separate 放入 n 列。

library(tidyverse)

n <- max(df$Quantity)

df1 <- df %>%
         mutate(new = map2_chr(`Sub-Category`, Quantity, ~paste(rep(.x, .y), collapse = ","))) %>%
         separate(new, paste("Sub-Category", seq_len(n))) %>%
         select(-`Sub-Category`)

df1

#       Order ID  Quantity Sub-Category 1 Sub-Category 2 Sub-Category 3
#1 CA-2017-152156        2      Bookcases      Bookcases           <NA>
#2 CA-2017-152156        3         Chairs         Chairs         Chairs
#3 CA-2017-138688        2         Labels         Labels           <NA>

就问题 2) 而言,我不是 100% 清楚你在寻找什么(因为没有预期的输出)但我认为你正在寻找 group_by Order ID 和将每个组的类别折叠成一行?

df1 %>%
  group_by(`Order ID`) %>%
  summarise_at(vars(starts_with("Sub")), list(~paste(na.omit(.), collapse = ",")))

# A tibble: 2 x 4
#  `Order ID`   `Sub-Category 1` `Sub-Category 2` `Sub-Category 3`
#  <fct>          <chr>            <chr>            <chr>           
#1 CA-2017-138688 Labels           Labels           ""              
#2 CA-2017-152156 Bookcases,Chairs Bookcases,Chairs Chairs